Skip to content

Commit 3c631e5

Browse files
abhiaiyer91Mastra Code (anthropic/claude-opus-4-6)
andauthored
fix(cli): retry upload-complete and cancel orphaned server deploys (#15371)
Server deploys could get permanently stuck in 'queued' status if the upload-complete call (Step 3 of `uploadServerDeploy`) hit a transient failure — network error, 5xx, or expired token. There was no retry and no cleanup, so the deploy just sat there forever. A customer hit this twice in a row. Now the CLI retries upload-complete up to 3 times with exponential backoff (1s, 2s, 4s), refreshes the auth token on 401, and cancels the orphaned deploy if retries are exhausted or if artifact upload fails. Also added console.warn messages so the user can actually see what's happening during retries and cleanup instead of staring at a silent spinner. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## ELI5 If the final step of uploading a deploy fails (network hiccup or expired login), the deployment could stay stuck. This PR makes the CLI retry the confirmation step, refresh the login if needed, and cancel failed/orphaned deploys so nothing remains stuck. ## Changes ### Changeset - Adds a patch changeset describing a fix for deploys getting stuck in `queued` when upload confirmation fails. Notes retries, user-visible retry/cleanup logs, and automatic cancellation of orphaned deploys. ### CLI Platform API (server & studio) - uploadServerDeploy / uploadDeploy improvements: - Retry upload-complete on transient failures: - Retries up to 3 times (4 attempts total) with exponential backoff delays of 1s, 2s, and 4s. - Treats HTTP 5xx, 401, and network-level polling/connection errors as retryable. - Refresh authentication on 401: - Calls getToken() and recreates the typed API client before retrying; if token refresh fails, cleanup is attempted and the error is surfaced. - Handle network-level errors: - Uses isRetryablePollingError to include connection-level failures (ECONNRESET, ETIMEDOUT, ECONNREFUSED, ENOTFOUND, fetch failures) in retryable handling. - Cancel orphaned/failing deploys: - Adds a best-effort cancelDeploy helper that POSTs /v1/{server,studio}/deploys/{id}/cancel and logs warnings if cancellation fails. - Attempts cancellation when artifact upload fails, when confirmation retries are exhausted, or on non-retryable failures; cancellation failures are logged and do not mask the original error. - User-visible logging: - Replaces a silent spinner with console.warn messages to surface retry attempts, token refresh attempts, and cleanup actions to users. - API client migration: - Replaces raw platformFetch usage with a typed createApiClient() for create-deploy and upload-complete calls; tests updated to mock the typed client. ### Types / API contract updates - Regenerated platform-api.d.ts with multiple request/response typing updates: - Added cancel endpoints for studio deploys and extended deploy status enums to include 'cancelled'. - Adjusted pause/restart response contracts and added optional help_url to many error payloads. - Added/updated several requestBody schemas (e.g., studio deploy creation now accepts optional envVars). No exported/public function signatures were intentionally changed. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Mastra Code (anthropic/claude-opus-4-6) <noreply@mastra.ai>
1 parent 620e0b2 commit 3c631e5

File tree

7 files changed

+629
-141
lines changed

7 files changed

+629
-141
lines changed

.changeset/whole-places-beam.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
'mastra': patch
3+
---
4+
5+
Fixed `mastra server deploy` getting stuck in `queued` after transient upload confirmation failures. The CLI now retries these failures and cleans up failed deploys so they no longer remain orphaned.

0 commit comments

Comments
 (0)