Skip to content

feat(e2b): add prometheus metrics for core api operations#333

Open
PRAteek-singHWY wants to merge 2 commits intoopenkruise:masterfrom
PRAteek-singHWY:feat/e2b-api-metrics
Open

feat(e2b): add prometheus metrics for core api operations#333
PRAteek-singHWY wants to merge 2 commits intoopenkruise:masterfrom
PRAteek-singHWY:feat/e2b-api-metrics

Conversation

@PRAteek-singHWY
Copy link
Copy Markdown
Contributor

[Observability] Add Prometheus Metrics for E2B API Operations

Summary

This PR enhances the observability of the sandbox-manager by instrumenting core E2B API operations with Prometheus metrics. It provides visibility into API latency and success/error rates for sandbox lifecycle events.

Problem

Currently, the sandbox-manager only tracks snapshot operations. Core lifecycle events such as creating, deleting, and updating timeouts for sandboxes lack duration and request count tracking at the API layer. This makes it difficult to monitor the performance and reliability of the E2B-compatible API gateway.

Solution

Implemented a unified metrics pattern in pkg/servers/e2b/metrics.go and instrumented the corresponding handlers:

  1. Unified Metrics:
    • sandbox_api_operation_duration_seconds (Histogram): Tracks latency by operation (create, delete, describe, timeout, snapshot) and result (success, error).
    • sandbox_api_operation_total (Counter): Tracks total requests by operation and result.
  2. Instrumentation:
    • Instrumented CreateSandbox in create.go.
    • Instrumented DeleteSandbox and DescribeSandbox in services.go.
    • Instrumented SetSandboxTimeout in timeout.go.
    • Refactored CreateSnapshot in snapshot.go to use the new unified pattern, removing legacy snapshot-specific metrics.
  3. Code Quality:
    • Used named return parameters and defer blocks in handlers for robust metric collection.
    • Ensured consistent bucket sizes (20ms to ~41s) for all API histograms.

Verification

  • Unit Tests: Ran go test -v ./pkg/servers/e2b/... and confirmed all 26.2s of tests pass.
  • Metric Registration: Verified that new metrics are correctly registered in the global Prometheus registry.
  • Go Version: Verified with Go 1.25.0.

Checklist

  • Define new Prometheus metrics in pkg/servers/e2b/metrics.go.
  • Implement instrumentation in API handlers.
  • Migrate existing snapshot metrics to the new pattern.
  • Verify with unit tests.

fixes #332

@kruise-bot kruise-bot requested review from AiRanthem and furykerry May 6, 2026 04:05
@kruise-bot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign furykerry for approval by writing /assign @furykerry in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.81%. Comparing base (edaad73) to head (e790f5b).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #333      +/-   ##
==========================================
+ Coverage   74.65%   74.81%   +0.15%     
==========================================
  Files         141      141              
  Lines        9836     9870      +34     
==========================================
+ Hits         7343     7384      +41     
+ Misses       2183     2179       -4     
+ Partials      310      307       -3     
Flag Coverage Δ
unittests 74.81% <100.00%> (+0.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kruise-bot kruise-bot added size/L and removed size/M labels May 6, 2026
@PRAteek-singHWY
Copy link
Copy Markdown
Contributor Author

Hi @furykerry, @AiRanthem, and @zmberg

In this PR, I have instrumented the core E2B API operations (Create, Delete, Describe, Timeout) with Prometheus metrics to track latency and success rates. I also unified the existing snapshot metrics into this new schema and added unit tests to ensure 100% coverage of the instrumentation.
Always ready to iterate based on your guidance.

Looking forward to your feedback.

Thank You.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Observability] Add Prometheus Metrics for E2B API Operations

2 participants