refactor(tekton/v1): refactor ga release flow (#4092)
This pull request updates the Tekton pipeline for the GA release process, mainly by replacing the `tag-rc2ga-on-oci-artifacts` task with a new, more comprehensive `tag-and-deliver-rc2ga-on-oci-artifacts` task. It also improves parameterization and error handling, and ensures that registry information is consistently passed to relevant tasks.
Key changes include:
**Pipeline and Task Enhancements:**
* Replaced the old `tag-rc2ga-on-oci-artifacts` task with a new `tag-and-deliver-rc2ga-on-oci-artifacts` task, which adds steps for both tagging and delivering images and non-image artifacts, and introduces support for publisher integration. (`tekton/v1/pipelines/pingcap-release-ga.yaml`, `tekton/v1/tasks/release/tag-and-delivery-rc2ga-on-oci-artifacts.yaml`, `tekton/v1/tasks/release/tag-rc2ga-on-oci-artifacts.yaml`, `tekton/v1/tasks/kustomization.yaml`) [[1]](diffhunk://#diff-bac849553be941ad99016eaf5b895e6d53a615100d547e4444872e16ee1b5c0bR39-R42) [[2]](diffhunk://#diff-736ccd6c2ab65301098c707c8131533e39a0299b6803a977ad5422c128d90a24L31-R31) [[3]](diffhunk://#diff-a8352a08da1203e171edc1926649b706b5140c7fdbd7a3bb6037d5f1f812babbR1-R113) [[4]](diffhunk://#diff-81086175a1e1f5f10ef9339eedf4909fbf21667a4c902367f2c32e944edbd70fL1-L41)
* Added a new pipeline parameter `publisher-url` with a default value, enabling dynamic configuration of the publisher service endpoint. (`tekton/v1/pipelines/pingcap-release-ga.yaml`)
* Updated the pipeline to pass the `publisher-url` and `registry` parameters to the relevant tasks, improving flexibility and consistency. (`tekton/v1/pipelines/pingcap-release-ga.yaml`) [[1]](diffhunk://#diff-bac849553be941ad99016eaf5b895e6d53a615100d547e4444872e16ee1b5c0bR39-R42) [[2]](diffhunk://#diff-bac849553be941ad99016eaf5b895e6d53a615100d547e4444872e16ee1b5c0bR81) [[3]](diffhunk://#diff-bac849553be941ad99016eaf5b895e6d53a615100d547e4444872e16ee1b5c0bR104) [[4]](diffhunk://#diff-bac849553be941ad99016eaf5b895e6d53a615100d547e4444872e16ee1b5c0bR127) [[5]](diffhunk://#diff-bac849553be941ad99016eaf5b895e6d53a615100d547e4444872e16ee1b5c0bR150) [[6]](diffhunk://#diff-bac849553be941ad99016eaf5b895e6d53a615100d547e4444872e16ee1b5c0bR173) [[7]](diffhunk://#diff-bac849553be941ad99016eaf5b895e6d53a615100d547e4444872e16ee1b5c0bR196)
**Error Handling Improvements:**
* Changed error handling in `scripts/flow/rc/check-images-internal.ts` to use `console.error` and `Deno.exit(1)` instead of throwing errors, providing clearer CLI feedback and proper exit codes for CI/CD. (commit: 528bd9d)
TiCDC: Increase memory limits for TiCDC integration test pods to 24Gi (#4094)
## Increase Memory Limits for TiCDC Integration Tests
This pull request increases the memory resource limits for several TiCDC integration test pipelines to improve stability and prevent out-of-memory (OOM) failures during heavy workload testing.
### Why This Change Is Needed Recent test runs have shown intermittent failures in heavy integration tests due to memory exhaustion. The current 16Gi limit is insufficient for handling peak loads during complex data replication scenarios involving Kafka and storage integrations. Increasing the memory limit to 24Gi will provide a more stable testing environment and reduce flaky test results.
### Changes Made - Updated memory limits from `16Gi` to `24Gi` in four integration test configurations: - `pull_cdc_kafka_integration_heavy/pod-test.yaml` - `pull_cdc_kafka_integration_heavy_next_gen/pod-test.yaml` - `pull_cdc_storage_integration_heavy/pod-test.yaml` - `pull_cdc_storage_integration_heavy_next_gen/pod.yaml`
### Impact - CPU limits remain unchanged at 6 cores - No changes to test logic or container images - Only resource allocation adjustments for improved reliability - Should reduce OOM-related test failures in CI/CD pipelines
This change ensures our integration tests can run more reliably under heavy workloads, providing better confidence in TiCDC's performance and stability.
TiCDC: Increase memory limits for TiCDC Pulsar integration tests to improve stability (#4097)
# Increase Memory Limits for TiCDC Pulsar Integration Tests
This pull request increases the memory limits for two TiCDC Pulsar integration test pipelines to improve stability and prevent potential out-of-memory (OOM) failures during test execution.
## Changes Made
* Increased memory limit from `16Gi` to `24Gi` for the test container in the `pull_cdc_pulsar_integration_light` pipeline. * Increased memory limit from `16Gi` to `24Gi` for the test container in the `pull_cdc_pulsar_integration_light_next_gen` pipeline.
## Why This Change is Necessary
Recent test runs have shown that the Pulsar integration tests, which involve complex data streaming scenarios, can sometimes exceed the previous 16Gi memory allocation under heavy load. This increase ensures that the tests have sufficient headroom to run reliably without being terminated due to memory constraints, leading to more consistent and accurate test results.
TiCDC: Update Prow job triggers to remove 'all' keyword and enable run_before_merge for unit tests (#4099)
## Summary
This PR updates the Prow job configurations for TiCDC to refine the trigger patterns and ensure certain jobs run before merging. The changes improve the clarity of job triggers and ensure that critical build and unit test jobs are executed during the pre-merge phase.
## Changes
- **Refined trigger patterns for integration tests**: Removed the `|all` option from the regex triggers for several integration test jobs in both `latest-presubmits.yaml` and `latest-presubmits-next-gen.yaml`. This ensures that these jobs are only triggered explicitly by their specific command or the `next-gen` keyword (where applicable), reducing unnecessary runs and improving trigger precision. - **Updated `pull-unit-test` job**: Commented out `skip_if_only_changed` and added `run_before_merge: true` to guarantee unit tests are executed before merging, enhancing code quality assurance.
## Why These Changes?
- **Trigger clarity**: Removing the `all` keyword from triggers prevents unintended job executions when using the `/test all` command, aligning triggers with specific job intents. - **Pre-merge reliability**: Ensuring `run_before_merge` is set for build and unit test jobs guarantees these critical checks are always performed before code is merged, maintaining stability and preventing regressions. - **Consistency**: The updates apply to both the current and next-generation Prow configurations, ensuring uniform behavior across job suites.
TiCDC: Increase pipeline timeouts and remove stage timeouts to prevent premature failures (#4102)
## Why This PR standardizes timeout configurations across all TiCDC integration test pipelines to ensure consistency and prevent premature job failures. The changes address scenarios where tests may run longer than previously allocated timeouts, especially in heavy-load or complex integration scenarios.
## Changes - **Increased overall pipeline timeout** from 80/100 minutes to a uniform 120 minutes across all pipelines - **Removed per-stage timeout configurations** for Checkout, Prepare, and Test stages to rely on the global pipeline timeout - Applied changes consistently to both standard and next-gen variants of: - Kafka integration tests (light/heavy) - MySQL integration tests (light/heavy) - Pulsar integration tests (light) - Storage integration tests (light/heavy)
## Impact - Provides more consistent timeout behavior across all test types - Reduces maintenance overhead by eliminating per-stage timeouts - Allows tests to complete without artificial time constraints while maintaining reasonable overall limits - Ensures both standard and next-gen pipelines have identical timeout configurations