Cohere release#21
Conversation
- Removed GCP authentication steps from the peerpods-chart_image.yaml workflow. - Added new workflows: publish-cohere-release.yaml for handling semver-tagged releases and publish-cohere.yaml for publishing artifacts on pushes to the cohere branch. - Updated values.yaml to reflect new image repository and tag for the cloud-api-adaptor and peerpod-ctrl, aligning with the new release strategy.
38e8d72 to
25ef22d
Compare
…kflow - Consolidated echo commands into a single block for improved readability and maintainability in the GitHub Actions workflow for Cohere release.
- Changed the chart version format from `0.0.0-dev-cohere` to `0.0.0-dev.cohere` in the publish-cohere.yaml workflow to align with the new versioning convention.
- Modified the peerpod-ctrl job in both publish-cohere and publish-cohere-release workflows to build and push the image specifically for the amd64 platform. - Enhanced the steps for checking out the code, setting up Docker Buildx, and logging into GHCR, ensuring a more streamlined and reliable build process. - Introduced a step to dynamically determine image tags based on the outputs from the tags job, improving flexibility in image versioning.
- Changed the build arguments in both publish-cohere and publish-cohere-release workflows to set GOFLAGS to use the GCP tag, aligning with the recent updates for GCP support in the peerpod-ctrl image.
- Updated the publish-cohere-release workflow to include steps for checking out code, installing dependencies, and patching values.yaml with release image tags. - Added support for dynamic nodeSelector and tolerations in the Helm chart, allowing consumers to customize DaemonSet configurations. - Improved the overall structure and readability of the workflow by consolidating steps and ensuring proper handling of image tags and chart versioning.
…g workflow - Introduced a check to ensure the digest is successfully extracted after pushing the Helm chart to the OCI registry. If the digest extraction fails, an error message is displayed and the process exits with a non-zero status, improving the reliability of the release workflow.
fix-gke-node-config loses universal tolerationThe old template hardcoded Two options: A) Default tolerations to the old behavior: # values.yaml
tolerations:
- operator: ExistsSimple, backward-compatible. Both DaemonSets share the same toleration. B) Give fix-gke-node-config its own tolerations field (recommended): # values.yaml
gkeNodeFix:
tolerations:
- operator: Exists# fix-gke-node-config.yaml
tolerations:
{{- toYaml .Values.gkeNodeFix.tolerations | nindent 8 }}These two DaemonSets have different scheduling needs. The node-fix is infrastructure prep — it should tolerate everything on target nodes. CAA is the workload — it should only run where configured. Coupling them to the same |
alhassankhedr-cohere
left a comment
There was a problem hiding this comment.
Made few comments for your review
- Updated the yq installation steps in both peerpods-chart_image.yaml and publish-cohere-release.yaml to use a consistent method with version and checksum verification. - This change enhances security and reliability by ensuring the correct version of yq is installed and verified before use.
- Updated the branch triggers in the publish-cohere.yaml workflow to only include the 'cohere' branch, removing the 'cohere-release' branch. This change streamlines the workflow's execution conditions.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit f3711c3. Configure here.
|
|
||
| - name: Read Go version from versions.yaml | ||
| run: | | ||
| command -v yq || sudo snap install yq |
There was a problem hiding this comment.
Unpinned yq install in jobs with registry write
Medium Severity
The caa jobs in both new workflows install yq via sudo snap install yq with no version pin or integrity check, while running with packages: write permission (GHCR push access). The same PR correctly demonstrates the pinned+checksummed pattern for yq in peerpods-chart_image.yaml and the chart job of publish-cohere-release.yaml. A compromised snap package could execute arbitrary code with registry write access.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit f3711c3. Configure here.
| {{- with .Values.tolerations }} | ||
| tolerations: | ||
| {{- toYaml . | nindent 8 }} | ||
| {{- end }} |
There was a problem hiding this comment.
DaemonSet loses universal toleration, breaking tainted nodes
High Severity
The fix-gke-node-config DaemonSet previously hardcoded tolerations: [{operator: Exists}], allowing it to schedule on every node regardless of taints. This PR replaces it with {{- with .Values.tolerations }} which defaults to tolerations: []. On upgrade, the DaemonSet silently stops scheduling on any tainted nodes — and CAA worker nodes typically carry taints like kata.peerpods.io/vm:NoSchedule. New nodes won't receive the required containerd/kubelet patches.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit f3711c3. Configure here.
b0d7bbf
into
feat/gcp-workload-identity-federation


This PR introduces new workflows for publishing Cohere-fork artifacts to GHCR on every push to the
coherebranch and GitHub Releases targeting thecoherebranch. It also updates thepeerpods-chart_image.yamlworkflow to use a pinned version ofyqwith SHA256 verification. Additionally, the PR removes GCP authentication steps and modifies Helm chart dependencies.publish-cohere-release.yaml: Publishes semver-tagged Cohere-fork release artifacts to GHCR.publish-cohere.yaml: Publishes Cohere-fork artifacts to GHCR on every push to thecoherebranch.peerpods-chart_image.yaml: Installsyqusing a pinned version with SHA256 verification.tolerationssupport indaemonset.yamlandfix-gke-node-config.yaml.values.yaml.daemonsetupdate strategy invalues.yaml.snapshottersetup inkata-deploy.Note
Medium Risk
Introduces new CI publishing pipelines and changes Helm chart deployment defaults (image registries, scheduling constraints, rolling update behavior), which can affect release automation and cluster rollouts if misconfigured.
Overview
Adds two new GitHub Actions workflows to publish Cohere-fork artifacts to GHCR: one for every push to
cohere(publishinglatest-cohere+ short-SHA image tags and a floating0.0.0-dev.coherechart tag), and one for GitHub Releases oncoherethat republishes immutable semver-tagged images/charts and generates an OCI chart attestation.Updates the reusable
peerpods-chart_image.yamlworkflow to pin and checksum-verifyyqand to drop the Artifact Registry (GCP) auth path, standardizing chart publishing on GHCR.Modifies the
peerpodsHelm chart to default to Cohere GHCR image repositories/tags, add configurabletolerations(alongsidenodeSelector) for the CAA and GKE-fix DaemonSets, adjust DaemonSet rolling update defaults to avoid hostNetwork port conflicts (maxSurge: 0), and tweakkata-deploydefaults to disable snapshotter setup and configure kata-remote containerd settings (overlayfs fallback + force guest image pulls).Reviewed by Cursor Bugbot for commit f3711c3. Bugbot is set up for automated code reviews on this repo. Configure here.