feat(discovery): Rust-to-Go logging bridge for libdd_discovery#49395
feat(discovery): Rust-to-Go logging bridge for libdd_discovery#49395martavicentenavarro wants to merge 2 commits intomainfrom
Conversation
d6d840a to
62a9617
Compare
Files inventory check summaryFile checks results against ancestor 56888f8d: Results for datadog-agent_7.79.0~devel.git.755.79bc052.pipeline.108251497-1_amd64.deb:No change detected |
Regression DetectorRegression Detector ResultsMetrics dashboard Baseline: 56888f8 Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | -1.78 | [-4.77, +1.20] | 1 | Logs |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | tcp_syslog_to_blackhole | ingress throughput | +0.56 | [+0.39, +0.73] | 1 | Logs |
| ➖ | quality_gate_metrics_logs | memory utilization | +0.46 | [+0.22, +0.70] | 1 | Logs bounds checks dashboard |
| ➖ | ddot_metrics_sum_cumulative | memory utilization | +0.35 | [+0.19, +0.50] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulativetodelta_exporter | memory utilization | +0.28 | [+0.05, +0.50] | 1 | Logs |
| ➖ | ddot_logs | memory utilization | +0.25 | [+0.18, +0.31] | 1 | Logs |
| ➖ | quality_gate_logs | % cpu utilization | +0.19 | [-1.45, +1.83] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_idle_all_features | memory utilization | +0.15 | [+0.11, +0.18] | 1 | Logs bounds checks dashboard |
| ➖ | otlp_ingest_logs | memory utilization | +0.08 | [-0.02, +0.17] | 1 | Logs |
| ➖ | otlp_ingest_metrics | memory utilization | +0.05 | [-0.11, +0.20] | 1 | Logs |
| ➖ | file_to_blackhole_0ms_latency | egress throughput | +0.05 | [-0.49, +0.58] | 1 | Logs |
| ➖ | ddot_metrics_sum_delta | memory utilization | +0.04 | [-0.14, +0.22] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api_v3 | ingress throughput | +0.01 | [-0.19, +0.21] | 1 | Logs |
| ➖ | tcp_dd_logs_filter_exclude | ingress throughput | -0.00 | [-0.11, +0.11] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api | ingress throughput | -0.01 | [-0.22, +0.20] | 1 | Logs |
| ➖ | quality_gate_idle | memory utilization | -0.02 | [-0.07, +0.03] | 1 | Logs bounds checks dashboard |
| ➖ | file_to_blackhole_100ms_latency | egress throughput | -0.02 | [-0.12, +0.08] | 1 | Logs |
| ➖ | file_to_blackhole_500ms_latency | egress throughput | -0.05 | [-0.46, +0.35] | 1 | Logs |
| ➖ | file_to_blackhole_1000ms_latency | egress throughput | -0.07 | [-0.51, +0.37] | 1 | Logs |
| ➖ | uds_dogstatsd_20mb_12k_contexts_20_senders | memory utilization | -0.15 | [-0.21, -0.09] | 1 | Logs |
| ➖ | docker_containers_memory | memory utilization | -0.17 | [-0.26, -0.09] | 1 | Logs |
| ➖ | ddot_metrics | memory utilization | -0.22 | [-0.40, -0.05] | 1 | Logs |
| ➖ | file_tree | memory utilization | -0.30 | [-0.35, -0.24] | 1 | Logs |
| ➖ | docker_containers_cpu | % cpu utilization | -1.78 | [-4.77, +1.20] | 1 | Logs |
Bounds Checks: ❌ Failed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | docker_containers_cpu | simple_check_run | 10/10 | 583 ≥ 26 | |
| ✅ | docker_containers_memory | memory_usage | 10/10 | 275.85MiB ≤ 370MiB | |
| ✅ | docker_containers_memory | simple_check_run | 10/10 | 677 ≥ 26 | |
| ✅ | file_to_blackhole_0ms_latency | memory_usage | 10/10 | 0.19GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_0ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_1000ms_latency | memory_usage | 10/10 | 0.24GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_1000ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_100ms_latency | memory_usage | 10/10 | 0.20GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_100ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_500ms_latency | memory_usage | 10/10 | 0.22GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_500ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ❌ | quality_gate_idle | intake_connections | 0/10 | 4 > 3 | bounds checks dashboard |
| ✅ | quality_gate_idle | memory_usage | 10/10 | 173.77MiB ≤ 181MiB | bounds checks dashboard |
| ❌ | quality_gate_idle_all_features | intake_connections | 2/10 | 4 > 3 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | memory_usage | 10/10 | 500.12MiB ≤ 550MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | intake_connections | 10/10 | 4 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_logs | memory_usage | 10/10 | 208.61MiB ≤ 220MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | cpu_usage | 10/10 | 351.70 ≤ 2000 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | intake_connections | 10/10 | 4 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | memory_usage | 10/10 | 418.25MiB ≤ 475MiB | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
CI Pass/Fail Decision
❌ Failed. Some Quality Gates were violated.
- quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check intake_connections: 2/10 replicas passed. Failed 8 which is > 0. Gate FAILED.
- quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check intake_connections: 0/10 replicas passed. Failed 10 which is > 0. Gate FAILED.
- quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
c1ec169 to
70710a7
Compare
70710a7 to
4da3327
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4da33278d6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ); | ||
| return; | ||
| } | ||
| log::set_max_level(log::LevelFilter::Trace); |
There was a problem hiding this comment.
Honor configured log level before forwarding Rust records
Setting log::set_max_level(log::LevelFilter::Trace) here causes the Rust log facade to build and dispatch every debug!/trace! record, and GoLogger::enabled returns true, so filtering only happens later in Go after string allocation and callback crossing. In discovery hot paths (for example trace! in procfs/fd.rs:get_log_files), this introduces avoidable per-record overhead even when the agent is running at info level, which is a production CPU regression. The Rust max level should be derived from the effective agent log level (and updated when that level changes) rather than hardcoded to Trace.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed. dd_discovery_init_logger now accepts the initial Go log level as a uint32_t parameter and passes it to log::set_max_level, so Rust discards records below the configured threshold before formatting or crossing the FFI.
Runtime level changes are not yet propagated (would require a separate dd_discovery_set_log_level FFI call hooked into ChangeLogLevel) , tracking that as a follow-up.
There was a problem hiding this comment.
Pull request overview
Adds a Rust→Go logging bridge for libdd_discovery so Rust log records emitted inside the discovery static library are forwarded into the agent’s standard Go logger (pkg/util/log) with consistent formatting/level handling.
Changes:
- Implement a Rust
log::Logbackend (GoLogger) that forwards records to a Go callback registered via an idempotentdd_discovery_init_logger. - Extend the generated C header (
dd_discovery.h) and cbindgen config to expose the callback typedef and init function. - Add Linux-only Go/cgo glue to export
goDiscoveryLogCallbackand register it during package initialization.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/discovery/module/rust/src/ffi.rs | Adds Rust-side logger implementation, callback storage, and dd_discovery_init_logger FFI export. |
| pkg/discovery/module/rust/include/dd_discovery.h | Exposes dd_log_fn typedef and dd_discovery_init_logger declaration for consumers. |
| pkg/discovery/module/rust/cbindgen.toml | Ensures the new callback typedef is included in generated headers. |
| pkg/discovery/module/impl_services_rust_logger_linux.go | Exports the Go callback and maps Rust log levels into pkg/util/log. |
| pkg/discovery/module/impl_services_rust_init_linux.go | Registers the callback at init-time and links against the Rust static library. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub unsafe extern "C" fn dd_discovery_init_logger(callback: dd_log_fn) { | ||
| // SAFETY: Wrapping in catch_unwind prevents a Rust panic from unwinding | ||
| // across the C ABI boundary, which would be undefined behaviour. | ||
| let _ = panic::catch_unwind(AssertUnwindSafe(|| { | ||
| if LOGGER_CALLBACK.set(callback).is_err() { | ||
| return; | ||
| } | ||
| // `set_logger` returns Err if another logger was already registered | ||
| // (e.g. in a test harness). In that case LOGGER_CALLBACK is set but | ||
| // GoLogger is not installed, so all Rust log records would be silently | ||
| // dropped. Surface the conflict so it is not invisible to the caller. | ||
| if log::set_logger(&GO_LOGGER).is_err() { | ||
| eprintln!( | ||
| "[dd_discovery] dd_discovery_init_logger: a Rust logger was already \ | ||
| registered; Rust log records will not be forwarded to the Go logger" | ||
| ); | ||
| return; | ||
| } |
There was a problem hiding this comment.
dd_discovery_init_logger returns void and, when log::set_logger fails (logger already registered), it only prints a message to stderr. In the system-probe context this can be invisible to operators, leaving Rust logs silently dropped with no structured signal to the caller. Consider changing the C ABI to return a status code (success / already-initialized / logger-conflict) so Go can log/report it via pkg/util/log, and/or use the provided callback to emit this error into the Go logger instead of eprintln!.
There was a problem hiding this comment.
Fixed. Replaced the eprintln! with a direct call to the provided callback at error level (1), so the conflict is reported through the Go structured logger instead of stderr.
| void dd_discovery_init_logger(dd_log_fn callback); | ||
|
|
There was a problem hiding this comment.
The exported API dd_discovery_init_logger currently returns void, so callers cannot programmatically detect whether logger installation succeeded (e.g., if a Rust logger was already registered). Since the Rust implementation can fall back to stderr today, consider updating this declaration to return an explicit status code so Go callers can surface failures in agent logs and handle them deterministically.
There was a problem hiding this comment.
The stderr fallback is gone due to the fix in the previous comment.
| func goDiscoveryLogCallback(level C.uint32_t, msg *C.char, msgLen C.size_t) { | ||
| // Reject nil pointers, empty messages (nothing useful to log), and messages | ||
| // larger than MaxInt32 to prevent overflow when casting msgLen to C.int below. | ||
| if msg == nil || msgLen == 0 || msgLen > math.MaxInt32 { | ||
| return | ||
| } | ||
| message := C.GoStringN(msg, C.int(msgLen)) | ||
| // Level mapping mirrors dd_log_fn in dd_discovery.h: 1=Error, 2=Warn, 3=Info, 4=Debug, 5+=Trace. |
There was a problem hiding this comment.
goDiscoveryLogCallback always converts the C buffer to a Go string (C.GoStringN) before checking whether the target Go log level is enabled. This incurs an allocation/copy even when the agent is configured to drop that level (e.g., Trace/Debug). Consider mapping the incoming level to pkg/util/log's LogLevel and using log.ShouldLog(...) to early-return before calling C.GoStringN.
There was a problem hiding this comment.
Fixed. Added a log.ShouldLog() check before C.GoStringN so the string allocation is skipped entirely for records below the configured level.
| impl log::Log for GoLogger { | ||
| fn enabled(&self, _metadata: &log::Metadata) -> bool { | ||
| true // All records pass through; Go-side filtering applies. | ||
| } | ||
|
|
||
| fn log(&self, record: &log::Record) { | ||
| let Some(&callback) = LOGGER_CALLBACK.get() else { | ||
| return; | ||
| }; | ||
| let level: u32 = match record.level() { | ||
| log::Level::Error => 1, | ||
| log::Level::Warn => 2, | ||
| log::Level::Info => 3, | ||
| log::Level::Debug => 4, | ||
| log::Level::Trace => 5, | ||
| }; | ||
| let message = record.args().to_string(); | ||
| // SAFETY: `callback` was registered by the Go runtime and is valid for | ||
| // the process lifetime. `message` is live for the duration of this call. | ||
| unsafe { callback(level, message.as_ptr() as *const c_char, message.len()) }; | ||
| } | ||
|
|
||
| fn flush(&self) {} | ||
| } | ||
|
|
||
| static GO_LOGGER: GoLogger = GoLogger; | ||
|
|
||
| /// Register a Go logging callback. Idempotent: only the first call has effect; | ||
| /// subsequent calls (e.g. from racing goroutines at start-up) are ignored. | ||
| /// | ||
| /// All log levels are forwarded to the callback; the Go side applies its own | ||
| /// level filter so no records are emitted above the configured level. | ||
| /// | ||
| /// # Panic safety | ||
| /// Wrapped in `catch_unwind` following the convention in this file: unwinding | ||
| /// across a C ABI boundary is undefined behaviour per the Rust nomicon. | ||
| /// | ||
| /// # Safety | ||
| /// `callback` must be a valid function pointer that remains valid for the | ||
| /// lifetime of the process. | ||
| #[unsafe(no_mangle)] | ||
| pub unsafe extern "C" fn dd_discovery_init_logger(callback: dd_log_fn) { | ||
| // SAFETY: Wrapping in catch_unwind prevents a Rust panic from unwinding | ||
| // across the C ABI boundary, which would be undefined behaviour. | ||
| let _ = panic::catch_unwind(AssertUnwindSafe(|| { | ||
| if LOGGER_CALLBACK.set(callback).is_err() { | ||
| return; | ||
| } | ||
| // `set_logger` returns Err if another logger was already registered | ||
| // (e.g. in a test harness). In that case LOGGER_CALLBACK is set but | ||
| // GoLogger is not installed, so all Rust log records would be silently | ||
| // dropped. Surface the conflict so it is not invisible to the caller. | ||
| if log::set_logger(&GO_LOGGER).is_err() { | ||
| eprintln!( | ||
| "[dd_discovery] dd_discovery_init_logger: a Rust logger was already \ | ||
| registered; Rust log records will not be forwarded to the Go logger" | ||
| ); | ||
| return; | ||
| } | ||
| log::set_max_level(log::LevelFilter::Trace); | ||
| })); |
There was a problem hiding this comment.
dd_discovery_init_logger sets the Rust global max level to Trace and GoLogger.enabled() always returns true. This forces every Rust log call (including Trace) to cross the FFI boundary and allocate/format even when Go is configured to drop that level, which can be a significant overhead in hot paths. Consider accepting a max level from Go (or another mechanism to reflect Go’s current level) and setting log::set_max_level accordingly, and/or implementing enabled() to filter by that level before formatting/forwarding.
There was a problem hiding this comment.
Fixed based on a previous comment.
- Pass the current Go log level to dd_discovery_init_logger so Rust sets log::set_max_level accordingly, discarding records below the configured threshold before formatting or crossing the FFI boundary. - Add log.ShouldLog() guard in goDiscoveryLogCallback before C.GoStringN to skip the string allocation for records that will be dropped. - Replace eprintln! with a direct callback invocation at error level when log::set_logger fails, so the conflict reaches the Go structured logger instead of being lost to stderr. Runtime propagation of log level changes is left as a follow-up. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
What does this PR do?
Forwards Rust log records from
libdd_discoveryto Go'spkg/util/logso they appear in the same log files as the rest of the agent, with the same format and level filtering.Two new Go files are added under
pkg/discovery/module/:impl_services_rust_logger_linux.go— exportsgoDiscoveryLogCallbackto C. Called by the Rust library for each log record; routes by level tolog.Errorf/Warnf/Infof/Debugf/Tracefwith a[dd_discovery]prefix.impl_services_rust_init_linux.go— registers the callback at packageinit()time viadd_discovery_init_logger.On the Rust side (
pkg/discovery/module/rust/src/ffi.rs), aGoLoggerstruct implementslog::Logand forwards every record to the registered callback stored in aOnceLock.log::set_max_level(LevelFilter::Trace)is set so all records reach the Go callback; Go's own level filter then decides what is actually emitted. The exported C functiondd_discovery_init_loggeris idempotent: only the first call has effect, so concurrentinit()calls at start-up are safe.dd_discovery.hgains thedd_log_fntypedef and thedd_discovery_init_loggerdeclaration.dd_log_fnis explicitly added to cbindgen's[export] includelist (type aliases require explicit listing);dd_discovery_init_loggeris auto-exported by cbindgen as apub extern "C"function.Motivation
Without this bridge, Rust log output from
libdd_discoveryis silently discarded when the library is linked into system-probe via CGo. Operators have no visibility into what the library is doing, which makes debugging in production impossible.This follows the same pattern already used in the repo for the Python runtime (
pkg/collector/python), whereLogMessageis exported to C and registered viaset_log_cbat initialisation time.Describe how you validated your changes
Manually verified that log records emitted by the Rust library appear in the agent log output at the expected levels.
The existing unit tests in
ffi.rscover the pre-existing FFI layer (services_response_to_result,dd_discovery_free,pids_from_c). The new logger bridge (GoLogger,dd_discovery_init_logger) does not have dedicated unit tests: testingdd_discovery_init_loggerin a Rust unit test would permanently mutate the process-globallog::loggerstatic, which would interfere with other tests in the same binary.Additional Notes
The two-file split (
impl_services_rust_logger_linux.go+impl_services_rust_init_linux.go) is required by a CGo constraint: a file containing//exportcannot have C function definitions in its preamble, only declarations. The static helpergetGoLogCallback()(which returns the function pointer) must therefore live in a separate file.