feat(llmobs): support submitting trace and session level evals#17530
feat(llmobs): support submitting trace and session level evals#17530
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 07330929f1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Performance SLOsComparing candidate christopher.fox/trace-session-level-evals (0733092) with baseline main (278a9a1) 📈 Performance Regressions (2 suites)📈 iastaspects - 118/118✅ add_aspectTime: ✅ 103.526µs (SLO: <130.000µs 📉 -20.4%) vs baseline: +2.9% Memory: ✅ 43.851MB (SLO: <46.000MB -4.7%) vs baseline: +5.0% ✅ add_inplace_aspectTime: ✅ 100.263µs (SLO: <130.000µs 📉 -22.9%) vs baseline: -3.0% Memory: ✅ 43.845MB (SLO: <46.000MB -4.7%) vs baseline: +5.0% ✅ add_inplace_noaspectTime: ✅ 28.420µs (SLO: <40.000µs 📉 -28.9%) vs baseline: +0.4% Memory: ✅ 43.851MB (SLO: <46.000MB -4.7%) vs baseline: +5.1% ✅ add_noaspectTime: ✅ 49.182µs (SLO: <70.000µs 📉 -29.7%) vs baseline: ~same Memory: ✅ 44.235MB (SLO: <46.000MB -3.8%) vs baseline: +5.9% ✅ bytearray_aspectTime: ✅ 253.273µs (SLO: <400.000µs 📉 -36.7%) vs baseline: -9.2% Memory: ✅ 43.883MB (SLO: <46.000MB -4.6%) vs baseline: +5.1% ✅ bytearray_extend_aspectTime: ✅ 642.018µs (SLO: <800.000µs 📉 -19.7%) vs baseline: -2.5% Memory: ✅ 43.838MB (SLO: <46.000MB -4.7%) vs baseline: +5.0% ✅ bytearray_extend_noaspectTime: ✅ 265.027µs (SLO: <400.000µs 📉 -33.7%) vs baseline: -1.0% Memory: ✅ 44.326MB (SLO: <46.000MB -3.6%) vs baseline: +6.3% ✅ bytearray_noaspectTime: ✅ 142.762µs (SLO: <300.000µs 📉 -52.4%) vs baseline: -1.0% Memory: ✅ 43.894MB (SLO: <46.000MB -4.6%) vs baseline: +5.1% ✅ bytes_aspectTime: ✅ 218.579µs (SLO: <300.000µs 📉 -27.1%) vs baseline: -6.5% Memory: ✅ 43.809MB (SLO: <46.000MB -4.8%) vs baseline: +4.7% ✅ bytes_noaspectTime: ✅ 133.282µs (SLO: <200.000µs 📉 -33.4%) vs baseline: -0.9% Memory: ✅ 43.906MB (SLO: <46.000MB -4.6%) vs baseline: +4.8% ✅ bytesio_aspectTime: ✅ 3.811ms (SLO: <5.000ms 📉 -23.8%) vs baseline: -2.4% Memory: ✅ 44.157MB (SLO: <46.000MB -4.0%) vs baseline: +5.7% ✅ bytesio_noaspectTime: ✅ 319.762µs (SLO: <420.000µs 📉 -23.9%) vs baseline: +0.3% Memory: ✅ 43.889MB (SLO: <46.000MB -4.6%) vs baseline: +5.1% ✅ capitalize_aspectTime: ✅ 88.681µs (SLO: <300.000µs 📉 -70.4%) vs baseline: +0.2% Memory: ✅ 44.050MB (SLO: <46.000MB -4.2%) vs baseline: +5.3% ✅ capitalize_noaspectTime: ✅ 264.739µs (SLO: <300.000µs 📉 -11.8%) vs baseline: +4.3% Memory: ✅ 43.958MB (SLO: <46.000MB -4.4%) vs baseline: +5.3% ✅ casefold_aspectTime: ✅ 89.037µs (SLO: <500.000µs 📉 -82.2%) vs baseline: -0.2% Memory: ✅ 44.070MB (SLO: <46.000MB -4.2%) vs baseline: +5.5% ✅ casefold_noaspectTime: ✅ 316.528µs (SLO: <500.000µs 📉 -36.7%) vs baseline: +1.0% Memory: ✅ 43.948MB (SLO: <46.000MB -4.5%) vs baseline: +5.1% ✅ decode_aspectTime: ✅ 86.475µs (SLO: <100.000µs 📉 -13.5%) vs baseline: -0.9% Memory: ✅ 44.074MB (SLO: <46.000MB -4.2%) vs baseline: +5.5% ✅ decode_noaspectTime: ✅ 155.784µs (SLO: <210.000µs 📉 -25.8%) vs baseline: -1.0% Memory: ✅ 43.844MB (SLO: <46.000MB -4.7%) vs baseline: +4.9% ✅ encode_aspectTime: ✅ 84.417µs (SLO: <200.000µs 📉 -57.8%) vs baseline: -0.3% Memory: ✅ 44.078MB (SLO: <46.000MB -4.2%) vs baseline: +5.6% ✅ encode_noaspectTime: ✅ 143.893µs (SLO: <200.000µs 📉 -28.1%) vs baseline: -1.8% Memory: ✅ 43.876MB (SLO: <46.000MB -4.6%) vs baseline: +5.2% ✅ format_aspectTime: ✅ 14.556ms (SLO: <19.200ms 📉 -24.2%) vs baseline: -0.2% Memory: ✅ 44.102MB (SLO: <46.000MB -4.1%) vs baseline: +5.4% ✅ format_map_aspectTime: ✅ 16.349ms (SLO: <21.500ms 📉 -24.0%) vs baseline: -0.5% Memory: ✅ 44.114MB (SLO: <46.000MB -4.1%) vs baseline: +5.6% ✅ format_map_noaspectTime: ✅ 379.110µs (SLO: <500.000µs 📉 -24.2%) vs baseline: +1.4% Memory: ✅ 43.948MB (SLO: <46.000MB -4.5%) vs baseline: +5.2% ✅ format_noaspectTime: ✅ 312.804µs (SLO: <500.000µs 📉 -37.4%) vs baseline: -0.6% Memory: ✅ 43.783MB (SLO: <46.000MB -4.8%) vs baseline: +4.8% ✅ index_aspectTime: ✅ 138.310µs (SLO: <300.000µs 📉 -53.9%) vs baseline: +7.6% Memory: ✅ 43.783MB (SLO: <46.000MB -4.8%) vs baseline: +4.7% ✅ index_noaspectTime: ✅ 40.396µs (SLO: <300.000µs 📉 -86.5%) vs baseline: -0.8% Memory: ✅ 43.887MB (SLO: <46.000MB -4.6%) vs baseline: +5.3% ✅ join_aspectTime: ✅ 211.829µs (SLO: <300.000µs 📉 -29.4%) vs baseline: -3.9% Memory: ✅ 43.829MB (SLO: <46.000MB -4.7%) vs baseline: +4.9% ✅ join_noaspectTime: ✅ 142.857µs (SLO: <300.000µs 📉 -52.4%) vs baseline: -2.1% Memory: ✅ 44.319MB (SLO: <46.000MB -3.7%) vs baseline: +6.1% ✅ ljust_aspectTime: ✅ 506.801µs (SLO: <700.000µs 📉 -27.6%) vs baseline: -3.2% Memory: ✅ 43.929MB (SLO: <46.000MB -4.5%) vs baseline: +5.1% ✅ ljust_noaspectTime: ✅ 271.699µs (SLO: <300.000µs -9.4%) vs baseline: +3.2% Memory: ✅ 43.983MB (SLO: <46.000MB -4.4%) vs baseline: +5.4% ✅ lower_aspectTime: ✅ 302.069µs (SLO: <500.000µs 📉 -39.6%) vs baseline: -2.7% Memory: ✅ 43.740MB (SLO: <46.000MB -4.9%) vs baseline: +4.8% ✅ lower_noaspectTime: ✅ 239.805µs (SLO: <300.000µs 📉 -20.1%) vs baseline: ~same Memory: ✅ 43.866MB (SLO: <46.000MB -4.6%) vs baseline: +4.8% ✅ lstrip_aspectTime: ✅ 0.272ms (SLO: <3.000ms 📉 -90.9%) vs baseline: -1.5% Memory: ✅ 44.145MB (SLO: <46.000MB -4.0%) vs baseline: +6.2% ✅ lstrip_noaspectTime: ✅ 0.178ms (SLO: <3.000ms 📉 -94.1%) vs baseline: +0.8% Memory: ✅ 43.884MB (SLO: <46.000MB -4.6%) vs baseline: +4.9% ✅ modulo_aspectTime: ✅ 14.212ms (SLO: <18.750ms 📉 -24.2%) vs baseline: -0.6% Memory: ✅ 44.058MB (SLO: <46.000MB -4.2%) vs baseline: +5.4% ✅ modulo_aspect_for_bytearray_bytearrayTime: ✅ 14.722ms (SLO: <19.350ms 📉 -23.9%) vs baseline: -0.7% Memory: ✅ 43.953MB (SLO: <46.000MB -4.5%) vs baseline: +5.1% ✅ modulo_aspect_for_bytesTime: ✅ 14.307ms (SLO: <18.900ms 📉 -24.3%) vs baseline: -0.1% Memory: ✅ 43.977MB (SLO: <46.000MB -4.4%) vs baseline: +5.1% ✅ modulo_aspect_for_bytes_bytearrayTime: ✅ 14.538ms (SLO: <19.150ms 📉 -24.1%) vs baseline: +0.2% Memory: ✅ 44.317MB (SLO: <46.000MB -3.7%) vs baseline: +5.7% ✅ modulo_noaspectTime: ✅ 0.373ms (SLO: <3.000ms 📉 -87.6%) vs baseline: +0.3% Memory: ✅ 43.884MB (SLO: <46.000MB -4.6%) vs baseline: +5.0% ✅ replace_aspectTime: ✅ 18.384ms (SLO: <24.000ms 📉 -23.4%) vs baseline: +0.1% Memory: ✅ 44.104MB (SLO: <46.000MB -4.1%) vs baseline: +5.3% ✅ replace_noaspectTime: ✅ 289.584µs (SLO: <400.000µs 📉 -27.6%) vs baseline: +0.6% Memory: ✅ 43.827MB (SLO: <46.000MB -4.7%) vs baseline: +4.9% ✅ repr_aspectTime: ✅ 328.502µs (SLO: <420.000µs 📉 -21.8%) vs baseline: -4.3% Memory: ✅ 43.911MB (SLO: <46.000MB -4.5%) vs baseline: +5.2% ✅ repr_noaspectTime: ✅ 47.191µs (SLO: <90.000µs 📉 -47.6%) vs baseline: +0.4% Memory: ✅ 44.180MB (SLO: <46.000MB -4.0%) vs baseline: +6.0% ✅ rstrip_aspectTime: ✅ 385.581µs (SLO: <500.000µs 📉 -22.9%) vs baseline: -1.4% Memory: ✅ 44.154MB (SLO: <46.000MB -4.0%) vs baseline: +5.5% ✅ rstrip_noaspectTime: ✅ 182.643µs (SLO: <300.000µs 📉 -39.1%) vs baseline: -0.6% Memory: ✅ 43.909MB (SLO: <46.000MB -4.5%) vs baseline: +5.3% ✅ slice_aspectTime: ✅ 184.365µs (SLO: <300.000µs 📉 -38.5%) vs baseline: +0.4% Memory: ✅ 43.867MB (SLO: <46.000MB -4.6%) vs baseline: +5.0% ✅ slice_noaspectTime: ✅ 53.794µs (SLO: <90.000µs 📉 -40.2%) vs baseline: -0.5% Memory: ✅ 43.780MB (SLO: <46.000MB -4.8%) vs baseline: +4.8% ✅ stringio_aspectTime: ✅ 4.463ms (SLO: <5.000ms 📉 -10.7%) vs baseline: 📈 +14.5% Memory: ✅ 44.071MB (SLO: <46.000MB -4.2%) vs baseline: +5.5% ✅ stringio_noaspectTime: ✅ 348.896µs (SLO: <500.000µs 📉 -30.2%) vs baseline: -2.5% Memory: ✅ 43.888MB (SLO: <46.000MB -4.6%) vs baseline: +5.1% ✅ strip_aspectTime: ✅ 271.449µs (SLO: <350.000µs 📉 -22.4%) vs baseline: -1.7% Memory: ✅ 43.875MB (SLO: <46.000MB -4.6%) vs baseline: +5.0% ✅ strip_noaspectTime: ✅ 180.303µs (SLO: <240.000µs 📉 -24.9%) vs baseline: +0.3% Memory: ✅ 43.954MB (SLO: <46.000MB -4.4%) vs baseline: +5.0% ✅ swapcase_aspectTime: ✅ 339.924µs (SLO: <500.000µs 📉 -32.0%) vs baseline: -2.2% Memory: ✅ 44.194MB (SLO: <46.000MB -3.9%) vs baseline: +5.7% ✅ swapcase_noaspectTime: ✅ 274.872µs (SLO: <400.000µs 📉 -31.3%) vs baseline: +0.5% Memory: ✅ 43.885MB (SLO: <46.000MB -4.6%) vs baseline: +4.6% ✅ title_aspectTime: ✅ 320.284µs (SLO: <500.000µs 📉 -35.9%) vs baseline: -7.2% Memory: ✅ 43.840MB (SLO: <46.000MB -4.7%) vs baseline: +4.9% ✅ title_noaspectTime: ✅ 264.274µs (SLO: <400.000µs 📉 -33.9%) vs baseline: -1.6% Memory: ✅ 43.941MB (SLO: <46.000MB -4.5%) vs baseline: +5.2% ✅ translate_aspectTime: ✅ 498.039µs (SLO: <700.000µs 📉 -28.9%) vs baseline: -3.7% Memory: ✅ 44.047MB (SLO: <46.000MB -4.2%) vs baseline: +5.4% ✅ translate_noaspectTime: ✅ 427.279µs (SLO: <500.000µs 📉 -14.5%) vs baseline: -1.9% Memory: ✅ 43.796MB (SLO: <46.000MB -4.8%) vs baseline: +4.9% ✅ upper_aspectTime: ✅ 302.369µs (SLO: <500.000µs 📉 -39.5%) vs baseline: -3.6% Memory: ✅ 43.913MB (SLO: <46.000MB -4.5%) vs baseline: +4.9% ✅ upper_noaspectTime: ✅ 235.576µs (SLO: <400.000µs 📉 -41.1%) vs baseline: -5.5% Memory: ✅ 43.873MB (SLO: <46.000MB -4.6%) vs baseline: +4.8% 📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 536.290µs (SLO: <700.000µs 📉 -23.4%) vs baseline: 📈 +23.9% Memory: ✅ 43.835MB (SLO: <46.000MB -4.7%) vs baseline: +5.5% ✅ ospathbasename_noaspectTime: ✅ 441.115µs (SLO: <700.000µs 📉 -37.0%) vs baseline: +0.8% Memory: ✅ 43.796MB (SLO: <46.000MB -4.8%) vs baseline: +5.1% ✅ ospathjoin_aspectTime: ✅ 628.769µs (SLO: <700.000µs 📉 -10.2%) vs baseline: -2.0% Memory: ✅ 43.842MB (SLO: <46.000MB -4.7%) vs baseline: +5.2% ✅ ospathjoin_noaspectTime: ✅ 645.450µs (SLO: <700.000µs -7.8%) vs baseline: -1.5% Memory: ✅ 43.771MB (SLO: <46.000MB -4.8%) vs baseline: +5.4% ✅ ospathnormcase_aspectTime: ✅ 356.425µs (SLO: <700.000µs 📉 -49.1%) vs baseline: -1.6% Memory: ✅ 43.848MB (SLO: <46.000MB -4.7%) vs baseline: +5.6% ✅ ospathnormcase_noaspectTime: ✅ 361.489µs (SLO: <700.000µs 📉 -48.4%) vs baseline: -2.8% Memory: ✅ 43.869MB (SLO: <46.000MB -4.6%) vs baseline: +5.5% ✅ ospathsplit_aspectTime: ✅ 500.032µs (SLO: <700.000µs 📉 -28.6%) vs baseline: -0.8% Memory: ✅ 43.958MB (SLO: <46.000MB -4.4%) vs baseline: +5.2% ✅ ospathsplit_noaspectTime: ✅ 510.050µs (SLO: <700.000µs 📉 -27.1%) vs baseline: -1.7% Memory: ✅ 43.858MB (SLO: <46.000MB -4.7%) vs baseline: +5.1% ✅ ospathsplitdrive_aspectTime: ✅ 382.758µs (SLO: <700.000µs 📉 -45.3%) vs baseline: -0.1% Memory: ✅ 43.896MB (SLO: <46.000MB -4.6%) vs baseline: +5.5% ✅ ospathsplitdrive_noaspectTime: ✅ 72.839µs (SLO: <700.000µs 📉 -89.6%) vs baseline: +0.7% Memory: ✅ 43.830MB (SLO: <46.000MB -4.7%) vs baseline: +5.3% ✅ ospathsplitext_aspectTime: ✅ 471.487µs (SLO: <700.000µs 📉 -32.6%) vs baseline: +0.8% Memory: ✅ 43.792MB (SLO: <46.000MB -4.8%) vs baseline: +5.1% ✅ ospathsplitext_noaspectTime: ✅ 477.400µs (SLO: <700.000µs 📉 -31.8%) vs baseline: +1.1% Memory: ✅ 43.773MB (SLO: <46.000MB -4.8%) vs baseline: +5.5% 🟡 Near SLO Breach (6 suites)🟡 djangosimple - 30/30✅ appsecTime: ✅ 21.046ms (SLO: <22.300ms -5.6%) vs baseline: ~same Memory: ✅ 71.269MB (SLO: <73.500MB -3.0%) vs baseline: +5.0% ✅ exception-replay-enabledTime: ✅ 1.368ms (SLO: <1.450ms -5.6%) vs baseline: -0.3% Memory: ✅ 69.601MB (SLO: <71.500MB -2.7%) vs baseline: +5.3% ✅ iastTime: ✅ 20.981ms (SLO: <22.250ms -5.7%) vs baseline: ~same Memory: ✅ 71.270MB (SLO: <75.000MB -5.0%) vs baseline: +5.1% ✅ profilerTime: ✅ 15.244ms (SLO: <16.550ms -7.9%) vs baseline: +0.7% Memory: ✅ 60.110MB (SLO: <61.000MB 🟡 -1.5%) vs baseline: +5.5% ✅ resource-renamingTime: ✅ 20.807ms (SLO: <21.750ms -4.3%) vs baseline: +0.4% Memory: ✅ 71.302MB (SLO: <73.500MB -3.0%) vs baseline: +5.2% ✅ span-code-originTime: ✅ 21.416ms (SLO: <28.200ms 📉 -24.1%) vs baseline: +0.6% Memory: ✅ 71.349MB (SLO: <75.000MB -4.9%) vs baseline: +5.2% ✅ tracerTime: ✅ 21.051ms (SLO: <21.750ms -3.2%) vs baseline: ~same Memory: ✅ 71.366MB (SLO: <75.000MB -4.8%) vs baseline: +5.4% ✅ tracer-and-profilerTime: ✅ 21.020ms (SLO: <23.500ms 📉 -10.6%) vs baseline: +0.1% Memory: ✅ 73.330MB (SLO: <75.000MB -2.2%) vs baseline: +5.2% ✅ tracer-dont-create-db-spansTime: ✅ 20.999ms (SLO: <21.500ms -2.3%) vs baseline: -0.6% Memory: ✅ 71.363MB (SLO: <75.000MB -4.8%) vs baseline: +5.3% ✅ tracer-minimalTime: ✅ 17.872ms (SLO: <18.500ms -3.4%) vs baseline: -0.5% Memory: ✅ 71.291MB (SLO: <75.000MB -4.9%) vs baseline: +5.2% ✅ tracer-nativeTime: ✅ 20.864ms (SLO: <21.750ms -4.1%) vs baseline: -0.7% Memory: ✅ 71.353MB (SLO: <72.500MB 🟡 -1.6%) vs baseline: +5.3% ✅ tracer-no-cachesTime: ✅ 18.844ms (SLO: <19.650ms -4.1%) vs baseline: ~same Memory: ✅ 71.275MB (SLO: <75.000MB -5.0%) vs baseline: +5.1% ✅ tracer-no-databasesTime: ✅ 20.596ms (SLO: <21.100ms -2.4%) vs baseline: -0.9% Memory: ✅ 71.208MB (SLO: <75.000MB -5.1%) vs baseline: +4.9% ✅ tracer-no-middlewareTime: ✅ 20.780ms (SLO: <21.500ms -3.4%) vs baseline: +0.1% Memory: ✅ 71.307MB (SLO: <75.000MB -4.9%) vs baseline: +5.1% ✅ tracer-no-templatesTime: ✅ 20.845ms (SLO: <22.000ms -5.2%) vs baseline: +0.7% Memory: ✅ 71.327MB (SLO: <73.500MB -3.0%) vs baseline: +5.0% 🟡 otelsdkspan - 24/24✅ add-eventTime: ✅ 40.767ms (SLO: <42.000ms -2.9%) vs baseline: ~same Memory: ✅ 39.086MB (SLO: <40.750MB -4.1%) vs baseline: +6.2% ✅ add-linkTime: ✅ 36.470ms (SLO: <38.550ms -5.4%) vs baseline: +0.4% Memory: ✅ 38.987MB (SLO: <40.750MB -4.3%) vs baseline: +5.8% ✅ add-metricsTime: ✅ 219.912ms (SLO: <232.000ms -5.2%) vs baseline: +0.2% Memory: ✅ 38.987MB (SLO: <40.750MB -4.3%) vs baseline: +5.9% ✅ add-tagsTime: ✅ 214.035ms (SLO: <221.600ms -3.4%) vs baseline: -0.1% Memory: ✅ 39.007MB (SLO: <40.750MB -4.3%) vs baseline: +6.1% ✅ get-contextTime: ✅ 29.293ms (SLO: <31.300ms -6.4%) vs baseline: +0.4% Memory: ✅ 39.125MB (SLO: <40.750MB -4.0%) vs baseline: +6.4% ✅ is-recordingTime: ✅ 29.284ms (SLO: <31.000ms -5.5%) vs baseline: +0.2% Memory: ✅ 39.066MB (SLO: <40.750MB -4.1%) vs baseline: +5.9% ✅ record-exceptionTime: ✅ 63.080ms (SLO: <65.850ms -4.2%) vs baseline: -0.4% Memory: ✅ 39.086MB (SLO: <40.750MB -4.1%) vs baseline: +6.1% ✅ set-statusTime: ✅ 31.845ms (SLO: <34.150ms -6.7%) vs baseline: +0.5% Memory: ✅ 38.987MB (SLO: <40.750MB -4.3%) vs baseline: +6.0% ✅ startTime: ✅ 29.581ms (SLO: <30.150ms 🟡 -1.9%) vs baseline: +2.5% Memory: ✅ 39.046MB (SLO: <40.750MB -4.2%) vs baseline: +6.2% ✅ start-finishTime: ✅ 33.880ms (SLO: <35.350ms -4.2%) vs baseline: +0.4% Memory: ✅ 38.987MB (SLO: <40.750MB -4.3%) vs baseline: +6.2% ✅ start-finish-telemetryTime: ✅ 34.036ms (SLO: <35.450ms -4.0%) vs baseline: +1.1% Memory: ✅ 39.145MB (SLO: <40.750MB -3.9%) vs baseline: +6.3% ✅ update-nameTime: ✅ 31.165ms (SLO: <33.400ms -6.7%) vs baseline: ~same Memory: ✅ 39.046MB (SLO: <40.750MB -4.2%) vs baseline: +6.0% 🟡 otelspan - 22/22✅ add-eventTime: ✅ 40.806ms (SLO: <47.150ms 📉 -13.5%) vs baseline: ~same Memory: ✅ 41.201MB (SLO: <47.000MB 📉 -12.3%) vs baseline: +5.3% ✅ add-metricsTime: ✅ 236.258ms (SLO: <344.800ms 📉 -31.5%) vs baseline: ~same Memory: ✅ 45.625MB (SLO: <47.500MB -3.9%) vs baseline: +4.9% ✅ add-tagsTime: ✅ 277.912ms (SLO: <330.000ms 📉 -15.8%) vs baseline: +1.8% Memory: ✅ 45.589MB (SLO: <47.500MB -4.0%) vs baseline: +5.1% ✅ get-contextTime: ✅ 83.728ms (SLO: <92.350ms -9.3%) vs baseline: +0.2% Memory: ✅ 41.445MB (SLO: <46.500MB 📉 -10.9%) vs baseline: +5.1% ✅ is-recordingTime: ✅ 39.121ms (SLO: <44.500ms 📉 -12.1%) vs baseline: -0.5% Memory: ✅ 41.104MB (SLO: <47.500MB 📉 -13.5%) vs baseline: +5.1% ✅ record-exceptionTime: ✅ 61.040ms (SLO: <67.650ms -9.8%) vs baseline: -0.2% Memory: ✅ 41.884MB (SLO: <47.000MB 📉 -10.9%) vs baseline: +5.7% ✅ set-statusTime: ✅ 45.055ms (SLO: <50.400ms 📉 -10.6%) vs baseline: -0.1% Memory: ✅ 41.100MB (SLO: <47.000MB 📉 -12.6%) vs baseline: +5.2% ✅ startTime: ✅ 39.954ms (SLO: <44.500ms 📉 -10.2%) vs baseline: +2.9% Memory: ✅ 41.069MB (SLO: <47.000MB 📉 -12.6%) vs baseline: +5.4% ✅ start-finishTime: ✅ 90.341ms (SLO: <91.000ms 🟡 -0.7%) vs baseline: +0.5% Memory: ✅ 38.869MB (SLO: <46.500MB 📉 -16.4%) vs baseline: +5.4% ✅ start-finish-telemetryTime: ✅ 91.688ms (SLO: <92.000ms 🟡 -0.3%) vs baseline: +0.2% Memory: ✅ 38.633MB (SLO: <46.500MB 📉 -16.9%) vs baseline: +4.9% ✅ update-nameTime: ✅ 40.190ms (SLO: <45.150ms 📉 -11.0%) vs baseline: -0.2% Memory: ✅ 41.208MB (SLO: <47.000MB 📉 -12.3%) vs baseline: +5.3% 🟡 recursivecomputation - 8/8✅ deepTime: ✅ 312.077ms (SLO: <320.950ms -2.8%) vs baseline: ~same Memory: ✅ 37.415MB (SLO: <38.750MB -3.4%) vs baseline: +5.4% ✅ deep-profiledTime: ✅ 328.296ms (SLO: <359.150ms -8.6%) vs baseline: -0.4% Memory: ✅ 43.726MB (SLO: <46.000MB -4.9%) vs baseline: +5.4% ✅ mediumTime: ✅ 7.390ms (SLO: <7.450ms 🟡 -0.8%) vs baseline: -0.6% Memory: ✅ 36.255MB (SLO: <38.000MB -4.6%) vs baseline: +5.4% ✅ shallowTime: ✅ 1.050ms (SLO: <1.050ms 🟡 ~same) vs baseline: +2.0% Memory: ✅ 36.215MB (SLO: <38.000MB -4.7%) vs baseline: +5.1% 🟡 span - 26/26✅ add-eventTime: ✅ 19.522ms (SLO: <22.500ms 📉 -13.2%) vs baseline: -1.5% Memory: ✅ 38.412MB (SLO: <53.000MB 📉 -27.5%) vs baseline: +5.4% ✅ add-metricsTime: ✅ 89.521ms (SLO: <93.500ms -4.3%) vs baseline: +0.3% Memory: ✅ 42.920MB (SLO: <53.000MB 📉 -19.0%) vs baseline: +5.2% ✅ add-tagsTime: ✅ 148.922ms (SLO: <155.000ms -3.9%) vs baseline: +0.9% Memory: ✅ 42.948MB (SLO: <53.000MB 📉 -19.0%) vs baseline: +5.5% ✅ get-contextTime: ✅ 18.713ms (SLO: <20.500ms -8.7%) vs baseline: -1.5% Memory: ✅ 38.228MB (SLO: <53.000MB 📉 -27.9%) vs baseline: +5.1% ✅ is-recordingTime: ✅ 18.775ms (SLO: <20.500ms -8.4%) vs baseline: -1.3% Memory: ✅ 38.282MB (SLO: <53.000MB 📉 -27.8%) vs baseline: +5.0% ✅ record-exceptionTime: ✅ 38.387ms (SLO: <41.000ms -6.4%) vs baseline: -0.8% Memory: ✅ 38.912MB (SLO: <53.000MB 📉 -26.6%) vs baseline: +5.5% ✅ set-statusTime: ✅ 20.583ms (SLO: <22.000ms -6.4%) vs baseline: -1.3% Memory: ✅ 38.242MB (SLO: <53.000MB 📉 -27.8%) vs baseline: +5.0% ✅ startTime: ✅ 19.702ms (SLO: <20.500ms -3.9%) vs baseline: +4.5% Memory: ✅ 38.129MB (SLO: <53.000MB 📉 -28.1%) vs baseline: +4.7% ✅ start-finishTime: ✅ 57.818ms (SLO: <58.500ms 🟡 -1.2%) vs baseline: -1.0% Memory: ✅ 36.235MB (SLO: <38.000MB -4.6%) vs baseline: +5.3% ✅ start-finish-telemetryTime: ✅ 58.979ms (SLO: <60.000ms 🟡 -1.7%) vs baseline: -1.0% Memory: ✅ 36.215MB (SLO: <38.000MB -4.7%) vs baseline: +5.4% ✅ start-finish-traceid128Time: ✅ 60.234ms (SLO: <62.000ms -2.8%) vs baseline: -1.0% Memory: ✅ 36.196MB (SLO: <38.000MB -4.7%) vs baseline: +5.1% ✅ start-traceid128Time: ✅ 18.628ms (SLO: <22.500ms 📉 -17.2%) vs baseline: -1.6% Memory: ✅ 38.287MB (SLO: <53.000MB 📉 -27.8%) vs baseline: +5.3% ✅ update-nameTime: ✅ 19.328ms (SLO: <22.000ms 📉 -12.1%) vs baseline: -1.3% Memory: ✅ 38.418MB (SLO: <53.000MB 📉 -27.5%) vs baseline: +5.7% 🟡 tracer - 6/6✅ largeTime: ✅ 33.107ms (SLO: <32.950ms +0.5%) vs baseline: ~same Memory: ✅ 37.749MB (SLO: <39.250MB -3.8%) vs baseline: +6.0% ✅ mediumTime: ✅ 3.338ms (SLO: <3.500ms -4.6%) vs baseline: -0.2% Memory: ✅ 36.215MB (SLO: <38.750MB -6.5%) vs baseline: +5.2% ✅ smallTime: ✅ 386.022µs (SLO: <390.000µs 🟡 -1.0%) vs baseline: +3.4% Memory: ✅ 36.215MB (SLO: <38.750MB -6.5%) vs baseline: +5.1%
|
…ove debug artifacts Restructure eval_scope validation to default to 'span', pass eval_scope to telemetry, and remove leftover debug headers and print statement. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codeowners resolved as |
This comment has been minimized.
This comment has been minimized.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Description
Testing
Ran test script: https://github.com/DataDog/experimental/blob/main/users/christopher.fox/scripts/emit_eval_metrics.py
Observed in EVP the 5 eval_metric events (3 span-level, 1 trace-level, 1 session-level):
https://dd.datad0g.com/internal/events-ui/queries?index_name=llmobs&query_string=%40event_type%3Aeval-metric%20%40ml_app%3Aemit_eval_metrics&query_type=list&timerange=1775670893207-1776275693207l&track=llmobs
Risks
Additional Notes