feat(e2b): add prometheus metrics for core api operations#333
feat(e2b): add prometheus metrics for core api operations#333PRAteek-singHWY wants to merge 2 commits intoopenkruise:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #333 +/- ##
==========================================
+ Coverage 74.65% 74.81% +0.15%
==========================================
Files 141 141
Lines 9836 9870 +34
==========================================
+ Hits 7343 7384 +41
+ Misses 2183 2179 -4
+ Partials 310 307 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hi @furykerry, @AiRanthem, and @zmberg In this PR, I have instrumented the core E2B API operations (Create, Delete, Describe, Timeout) with Prometheus metrics to track latency and success rates. I also unified the existing snapshot metrics into this new schema and added unit tests to ensure 100% coverage of the instrumentation. Looking forward to your feedback. Thank You. |
[Observability] Add Prometheus Metrics for E2B API Operations
Summary
This PR enhances the observability of the
sandbox-managerby instrumenting core E2B API operations with Prometheus metrics. It provides visibility into API latency and success/error rates for sandbox lifecycle events.Problem
Currently, the
sandbox-manageronly tracks snapshot operations. Core lifecycle events such as creating, deleting, and updating timeouts for sandboxes lack duration and request count tracking at the API layer. This makes it difficult to monitor the performance and reliability of the E2B-compatible API gateway.Solution
Implemented a unified metrics pattern in
pkg/servers/e2b/metrics.goand instrumented the corresponding handlers:sandbox_api_operation_duration_seconds(Histogram): Tracks latency by operation (create,delete,describe,timeout,snapshot) and result (success,error).sandbox_api_operation_total(Counter): Tracks total requests by operation and result.CreateSandboxincreate.go.DeleteSandboxandDescribeSandboxinservices.go.SetSandboxTimeoutintimeout.go.CreateSnapshotinsnapshot.goto use the new unified pattern, removing legacy snapshot-specific metrics.deferblocks in handlers for robust metric collection.Verification
go test -v ./pkg/servers/e2b/...and confirmed all 26.2s of tests pass.Checklist
pkg/servers/e2b/metrics.go.fixes #332