Skip to content

Cohere release#21

Merged
yousef-cohere merged 9 commits intofeat/gcp-workload-identity-federationfrom
cohere-release
Apr 22, 2026
Merged

Cohere release#21
yousef-cohere merged 9 commits intofeat/gcp-workload-identity-federationfrom
cohere-release

Conversation

@yousef-cohere
Copy link
Copy Markdown

@yousef-cohere yousef-cohere commented Apr 17, 2026

This PR introduces new workflows for publishing Cohere-fork artifacts to GHCR on every push to the cohere branch and GitHub Releases targeting the cohere branch. It also updates the peerpods-chart_image.yaml workflow to use a pinned version of yq with SHA256 verification. Additionally, the PR removes GCP authentication steps and modifies Helm chart dependencies.

  • New Workflows:
    • publish-cohere-release.yaml: Publishes semver-tagged Cohere-fork release artifacts to GHCR.
    • publish-cohere.yaml: Publishes Cohere-fork artifacts to GHCR on every push to the cohere branch.
  • Workflow Updates:
    • peerpods-chart_image.yaml: Installs yq using a pinned version with SHA256 verification.
  • Removed Steps:
    • GCP authentication and Helm authentication with Artifact Registry.
  • Helm Chart Changes:
    • Added tolerations support in daemonset.yaml and fix-gke-node-config.yaml.
    • Updated default image tags and nodeSelector/tolerations in values.yaml.
    • Modified daemonset update strategy in values.yaml.
    • Removed snapshotter setup in kata-deploy.
  • Image and Chart Publishing:
    • Builds and publishes CAA and peerpod-ctrl images to GHCR.
    • Publishes peerpods Helm chart with version tagging and OCI registry support.

Note

Medium Risk
Introduces new CI publishing pipelines and changes Helm chart deployment defaults (image registries, scheduling constraints, rolling update behavior), which can affect release automation and cluster rollouts if misconfigured.

Overview
Adds two new GitHub Actions workflows to publish Cohere-fork artifacts to GHCR: one for every push to cohere (publishing latest-cohere + short-SHA image tags and a floating 0.0.0-dev.cohere chart tag), and one for GitHub Releases on cohere that republishes immutable semver-tagged images/charts and generates an OCI chart attestation.

Updates the reusable peerpods-chart_image.yaml workflow to pin and checksum-verify yq and to drop the Artifact Registry (GCP) auth path, standardizing chart publishing on GHCR.

Modifies the peerpods Helm chart to default to Cohere GHCR image repositories/tags, add configurable tolerations (alongside nodeSelector) for the CAA and GKE-fix DaemonSets, adjust DaemonSet rolling update defaults to avoid hostNetwork port conflicts (maxSurge: 0), and tweak kata-deploy defaults to disable snapshotter setup and configure kata-remote containerd settings (overlayfs fallback + force guest image pulls).

Reviewed by Cursor Bugbot for commit f3711c3. Bugbot is set up for automated code reviews on this repo. Configure here.

- Removed GCP authentication steps from the peerpods-chart_image.yaml workflow.
- Added new workflows: publish-cohere-release.yaml for handling semver-tagged releases and publish-cohere.yaml for publishing artifacts on pushes to the cohere branch.
- Updated values.yaml to reflect new image repository and tag for the cloud-api-adaptor and peerpod-ctrl, aligning with the new release strategy.
…kflow

- Consolidated echo commands into a single block for improved readability and maintainability in the GitHub Actions workflow for Cohere release.
Comment thread .github/workflows/publish-cohere.yaml Outdated
- Changed the chart version format from `0.0.0-dev-cohere` to `0.0.0-dev.cohere` in the publish-cohere.yaml workflow to align with the new versioning convention.
- Modified the peerpod-ctrl job in both publish-cohere and publish-cohere-release workflows to build and push the image specifically for the amd64 platform.
- Enhanced the steps for checking out the code, setting up Docker Buildx, and logging into GHCR, ensuring a more streamlined and reliable build process.
- Introduced a step to dynamically determine image tags based on the outputs from the tags job, improving flexibility in image versioning.
Comment thread .github/workflows/publish-cohere.yaml Outdated
- Changed the build arguments in both publish-cohere and publish-cohere-release workflows to set GOFLAGS to use the GCP tag, aligning with the recent updates for GCP support in the peerpod-ctrl image.
Comment thread .github/workflows/publish-cohere.yaml Outdated
- Updated the publish-cohere-release workflow to include steps for checking out code, installing dependencies, and patching values.yaml with release image tags.
- Added support for dynamic nodeSelector and tolerations in the Helm chart, allowing consumers to customize DaemonSet configurations.
- Improved the overall structure and readability of the workflow by consolidating steps and ensuring proper handling of image tags and chart versioning.
Comment thread .github/workflows/publish-cohere-release.yaml
…g workflow

- Introduced a check to ensure the digest is successfully extracted after pushing the Helm chart to the OCI registry. If the digest extraction fails, an error message is displayed and the process exits with a non-zero status, improving the reliability of the release workflow.
@yousef-cohere yousef-cohere changed the base branch from cohere to feat/gcp-workload-identity-federation April 21, 2026 20:36
@yousef-cohere yousef-cohere changed the base branch from feat/gcp-workload-identity-federation to cohere April 21, 2026 20:37
@yousef-cohere yousef-cohere changed the base branch from cohere to feat/gcp-workload-identity-federation April 21, 2026 20:42
@alhassankhedr-cohere
Copy link
Copy Markdown

fix-gke-node-config loses universal toleration

The old template hardcoded tolerations: [{operator: Exists}], meaning this DaemonSet scheduled on every node regardless of taints. This PR replaces it with {{- with .Values.tolerations }}, which defaults to []. On upgrade, the DaemonSet will be evicted from any tainted nodes — and since CAA worker nodes likely have taints (that's why we use nodeSelector fencing), the node-fix silently stops running. New nodes won't get the containerd/kubelet patches, causing hard-to-debug runtime failures.

Two options:

A) Default tolerations to the old behavior:

# values.yaml
tolerations:
- operator: Exists

Simple, backward-compatible. Both DaemonSets share the same toleration.

B) Give fix-gke-node-config its own tolerations field (recommended):

# values.yaml
gkeNodeFix:
  tolerations:
  - operator: Exists
# fix-gke-node-config.yaml
tolerations:
  {{- toYaml .Values.gkeNodeFix.tolerations | nindent 8 }}

These two DaemonSets have different scheduling needs. The node-fix is infrastructure prep — it should tolerate everything on target nodes. CAA is the workload — it should only run where configured. Coupling them to the same tolerations value means either the node-fix is too restrictive or CAA is too permissive.

Comment thread .github/workflows/publish-cohere-release.yaml Outdated
Comment thread .github/workflows/publish-cohere.yaml
Copy link
Copy Markdown

@alhassankhedr-cohere alhassankhedr-cohere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made few comments for your review

- Updated the yq installation steps in both peerpods-chart_image.yaml and publish-cohere-release.yaml to use a consistent method with version and checksum verification.
- This change enhances security and reliability by ensuring the correct version of yq is installed and verified before use.
- Updated the branch triggers in the publish-cohere.yaml workflow to only include the 'cohere' branch, removing the 'cohere-release' branch. This change streamlines the workflow's execution conditions.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit f3711c3. Configure here.


- name: Read Go version from versions.yaml
run: |
command -v yq || sudo snap install yq
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unpinned yq install in jobs with registry write

Medium Severity

The caa jobs in both new workflows install yq via sudo snap install yq with no version pin or integrity check, while running with packages: write permission (GHCR push access). The same PR correctly demonstrates the pinned+checksummed pattern for yq in peerpods-chart_image.yaml and the chart job of publish-cohere-release.yaml. A compromised snap package could execute arbitrary code with registry write access.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f3711c3. Configure here.

{{- with .Values.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DaemonSet loses universal toleration, breaking tainted nodes

High Severity

The fix-gke-node-config DaemonSet previously hardcoded tolerations: [{operator: Exists}], allowing it to schedule on every node regardless of taints. This PR replaces it with {{- with .Values.tolerations }} which defaults to tolerations: []. On upgrade, the DaemonSet silently stops scheduling on any tainted nodes — and CAA worker nodes typically carry taints like kata.peerpods.io/vm:NoSchedule. New nodes won't receive the required containerd/kubelet patches.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f3711c3. Configure here.

@yousef-cohere yousef-cohere merged commit b0d7bbf into feat/gcp-workload-identity-federation Apr 22, 2026
24 of 26 checks passed
@yousef-cohere yousef-cohere deleted the cohere-release branch April 22, 2026 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants