feat(discovery): Rust-to-Go logging bridge for libdd_discovery by martavicentenavarro · Pull Request #49395 · DataDog/datadog-agent

martavicentenavarro · 2026-04-15T15:30:10Z

What does this PR do?

Forwards Rust log records from libdd_discovery to Go's pkg/util/log so they appear in the same log files as the rest of the agent, with the same format and level filtering.

Two new Go files are added under pkg/discovery/module/:

impl_services_rust_logger_linux.go — exports goDiscoveryLogCallback to C. Called by the Rust library for each log record; routes by level to log.Errorf/Warnf/Infof/Debugf/Tracef with a [dd_discovery] prefix.
impl_services_rust_init_linux.go — registers the callback at package init() time via dd_discovery_init_logger.

On the Rust side (pkg/discovery/module/rust/src/ffi.rs), a GoLogger struct implements log::Log and forwards every record to the registered callback stored in a OnceLock. log::set_max_level(LevelFilter::Trace) is set so all records reach the Go callback; Go's own level filter then decides what is actually emitted. The exported C function dd_discovery_init_logger is idempotent: only the first call has effect, so concurrent init() calls at start-up are safe.

dd_discovery.h gains the dd_log_fn typedef and the dd_discovery_init_logger declaration. dd_log_fn is explicitly added to cbindgen's [export] include list (type aliases require explicit listing); dd_discovery_init_logger is auto-exported by cbindgen as a pub extern "C" function.

Motivation

Without this bridge, Rust log output from libdd_discovery is silently discarded when the library is linked into system-probe via CGo. Operators have no visibility into what the library is doing, which makes debugging in production impossible.

This follows the same pattern already used in the repo for the Python runtime (pkg/collector/python), where LogMessage is exported to C and registered via set_log_cb at initialisation time.

Describe how you validated your changes

Manually verified that log records emitted by the Rust library appear in the agent log output at the expected levels.

The existing unit tests in ffi.rs cover the pre-existing FFI layer (services_response_to_result, dd_discovery_free, pids_from_c). The new logger bridge (GoLogger, dd_discovery_init_logger) does not have dedicated unit tests: testing dd_discovery_init_logger in a Rust unit test would permanently mutate the process-global log::logger static, which would interfere with other tests in the same binary.

Additional Notes

The two-file split (impl_services_rust_logger_linux.go + impl_services_rust_init_linux.go) is required by a CGo constraint: a file containing //export cannot have C function definitions in its preamble, only declarations. The static helper getGoLogCallback() (which returns the function pointer) must therefore live in a separate file.

dd-octo-sts · 2026-04-15T16:02:56Z

Files inventory check summary

File checks results against ancestor 56888f8d:

Results for datadog-agent_7.79.0~devel.git.755.79bc052.pipeline.108251497-1_amd64.deb:

No change detected

cit-pr-commenter-54b7da · 2026-04-15T16:23:18Z

Regression Detector

Regression Detector Results

Metrics dashboard
Target profiles
Run ID: 9169dbe4-46d4-4d41-a55d-727749392f0e

Baseline: 56888f8
Comparison: 79bc052
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	docker_containers_cpu	% cpu utilization	-1.78	[-4.77, +1.20]	1	Logs

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	tcp_syslog_to_blackhole	ingress throughput	+0.56	[+0.39, +0.73]	1	Logs
➖	quality_gate_metrics_logs	memory utilization	+0.46	[+0.22, +0.70]	1	Logs bounds checks dashboard
➖	ddot_metrics_sum_cumulative	memory utilization	+0.35	[+0.19, +0.50]	1	Logs
➖	ddot_metrics_sum_cumulativetodelta_exporter	memory utilization	+0.28	[+0.05, +0.50]	1	Logs
➖	ddot_logs	memory utilization	+0.25	[+0.18, +0.31]	1	Logs
➖	quality_gate_logs	% cpu utilization	+0.19	[-1.45, +1.83]	1	Logs bounds checks dashboard
➖	quality_gate_idle_all_features	memory utilization	+0.15	[+0.11, +0.18]	1	Logs bounds checks dashboard
➖	otlp_ingest_logs	memory utilization	+0.08	[-0.02, +0.17]	1	Logs
➖	otlp_ingest_metrics	memory utilization	+0.05	[-0.11, +0.20]	1	Logs
➖	file_to_blackhole_0ms_latency	egress throughput	+0.05	[-0.49, +0.58]	1	Logs
➖	ddot_metrics_sum_delta	memory utilization	+0.04	[-0.14, +0.22]	1	Logs
➖	uds_dogstatsd_to_api_v3	ingress throughput	+0.01	[-0.19, +0.21]	1	Logs
➖	tcp_dd_logs_filter_exclude	ingress throughput	-0.00	[-0.11, +0.11]	1	Logs
➖	uds_dogstatsd_to_api	ingress throughput	-0.01	[-0.22, +0.20]	1	Logs
➖	quality_gate_idle	memory utilization	-0.02	[-0.07, +0.03]	1	Logs bounds checks dashboard
➖	file_to_blackhole_100ms_latency	egress throughput	-0.02	[-0.12, +0.08]	1	Logs
➖	file_to_blackhole_500ms_latency	egress throughput	-0.05	[-0.46, +0.35]	1	Logs
➖	file_to_blackhole_1000ms_latency	egress throughput	-0.07	[-0.51, +0.37]	1	Logs
➖	uds_dogstatsd_20mb_12k_contexts_20_senders	memory utilization	-0.15	[-0.21, -0.09]	1	Logs
➖	docker_containers_memory	memory utilization	-0.17	[-0.26, -0.09]	1	Logs
➖	ddot_metrics	memory utilization	-0.22	[-0.40, -0.05]	1	Logs
➖	file_tree	memory utilization	-0.30	[-0.35, -0.24]	1	Logs
➖	docker_containers_cpu	% cpu utilization	-1.78	[-4.77, +1.20]	1	Logs

Bounds Checks: ❌ Failed

perf	experiment	bounds_check_name	replicates_passed	observed_value	links
✅	docker_containers_cpu	simple_check_run	10/10	583 ≥ 26
✅	docker_containers_memory	memory_usage	10/10	275.85MiB ≤ 370MiB
✅	docker_containers_memory	simple_check_run	10/10	677 ≥ 26
✅	file_to_blackhole_0ms_latency	memory_usage	10/10	0.19GiB ≤ 1.20GiB
✅	file_to_blackhole_0ms_latency	missed_bytes	10/10	0B = 0B
✅	file_to_blackhole_1000ms_latency	memory_usage	10/10	0.24GiB ≤ 1.20GiB
✅	file_to_blackhole_1000ms_latency	missed_bytes	10/10	0B = 0B
✅	file_to_blackhole_100ms_latency	memory_usage	10/10	0.20GiB ≤ 1.20GiB
✅	file_to_blackhole_100ms_latency	missed_bytes	10/10	0B = 0B
✅	file_to_blackhole_500ms_latency	memory_usage	10/10	0.22GiB ≤ 1.20GiB
✅	file_to_blackhole_500ms_latency	missed_bytes	10/10	0B = 0B
❌	quality_gate_idle	intake_connections	0/10	4 > 3	bounds checks dashboard
✅	quality_gate_idle	memory_usage	10/10	173.77MiB ≤ 181MiB	bounds checks dashboard
❌	quality_gate_idle_all_features	intake_connections	2/10	4 > 3	bounds checks dashboard
✅	quality_gate_idle_all_features	memory_usage	10/10	500.12MiB ≤ 550MiB	bounds checks dashboard
✅	quality_gate_logs	intake_connections	10/10	4 ≤ 6	bounds checks dashboard
✅	quality_gate_logs	memory_usage	10/10	208.61MiB ≤ 220MiB	bounds checks dashboard
✅	quality_gate_logs	missed_bytes	10/10	0B = 0B	bounds checks dashboard
✅	quality_gate_metrics_logs	cpu_usage	10/10	351.70 ≤ 2000	bounds checks dashboard
✅	quality_gate_metrics_logs	intake_connections	10/10	4 ≤ 6	bounds checks dashboard
✅	quality_gate_metrics_logs	memory_usage	10/10	418.25MiB ≤ 475MiB	bounds checks dashboard
✅	quality_gate_metrics_logs	missed_bytes	10/10	0B = 0B	bounds checks dashboard

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

CI Pass/Fail Decision

❌ Failed. Some Quality Gates were violated.

quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_idle_all_features, bounds check intake_connections: 2/10 replicas passed. Failed 8 which is > 0. Gate FAILED.
quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_idle, bounds check intake_connections: 0/10 replicas passed. Failed 10 which is > 0. Gate FAILED.
quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.

martavicentenavarro · 2026-04-17T12:41:20Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4da33278d6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-17T12:48:53Z

+            );
+            return;
+        }
+        log::set_max_level(log::LevelFilter::Trace);


Honor configured log level before forwarding Rust records

Setting log::set_max_level(log::LevelFilter::Trace) here causes the Rust log facade to build and dispatch every debug!/trace! record, and GoLogger::enabled returns true, so filtering only happens later in Go after string allocation and callback crossing. In discovery hot paths (for example trace! in procfs/fd.rs:get_log_files), this introduces avoidable per-record overhead even when the agent is running at info level, which is a production CPU regression. The Rust max level should be derived from the effective agent log level (and updated when that level changes) rather than hardcoded to Trace.

Useful? React with 👍 / 👎.

Fixed. dd_discovery_init_logger now accepts the initial Go log level as a uint32_t parameter and passes it to log::set_max_level, so Rust discards records below the configured threshold before formatting or crossing the FFI.

Runtime level changes are not yet propagated (would require a separate dd_discovery_set_log_level FFI call hooked into ChangeLogLevel) , tracking that as a follow-up.

Copilot

Pull request overview

Adds a Rust→Go logging bridge for libdd_discovery so Rust log records emitted inside the discovery static library are forwarded into the agent’s standard Go logger (pkg/util/log) with consistent formatting/level handling.

Changes:

Implement a Rust log::Log backend (GoLogger) that forwards records to a Go callback registered via an idempotent dd_discovery_init_logger.
Extend the generated C header (dd_discovery.h) and cbindgen config to expose the callback typedef and init function.
Add Linux-only Go/cgo glue to export goDiscoveryLogCallback and register it during package initialization.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
pkg/discovery/module/rust/src/ffi.rs	Adds Rust-side logger implementation, callback storage, and `dd_discovery_init_logger` FFI export.
pkg/discovery/module/rust/include/dd_discovery.h	Exposes `dd_log_fn` typedef and `dd_discovery_init_logger` declaration for consumers.
pkg/discovery/module/rust/cbindgen.toml	Ensures the new callback typedef is included in generated headers.
pkg/discovery/module/impl_services_rust_logger_linux.go	Exports the Go callback and maps Rust log levels into `pkg/util/log`.
pkg/discovery/module/impl_services_rust_init_linux.go	Registers the callback at init-time and links against the Rust static library.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T12:50:15Z

+pub unsafe extern "C" fn dd_discovery_init_logger(callback: dd_log_fn) {
+    // SAFETY: Wrapping in catch_unwind prevents a Rust panic from unwinding
+    // across the C ABI boundary, which would be undefined behaviour.
+    let _ = panic::catch_unwind(AssertUnwindSafe(|| {
+        if LOGGER_CALLBACK.set(callback).is_err() {
+            return;
+        }
+        // `set_logger` returns Err if another logger was already registered
+        // (e.g. in a test harness). In that case LOGGER_CALLBACK is set but
+        // GoLogger is not installed, so all Rust log records would be silently
+        // dropped. Surface the conflict so it is not invisible to the caller.
+        if log::set_logger(&GO_LOGGER).is_err() {
+            eprintln!(
+                "[dd_discovery] dd_discovery_init_logger: a Rust logger was already \
+                 registered; Rust log records will not be forwarded to the Go logger"
+            );
+            return;
+        }


dd_discovery_init_logger returns void and, when log::set_logger fails (logger already registered), it only prints a message to stderr. In the system-probe context this can be invisible to operators, leaving Rust logs silently dropped with no structured signal to the caller. Consider changing the C ABI to return a status code (success / already-initialized / logger-conflict) so Go can log/report it via pkg/util/log, and/or use the provided callback to emit this error into the Go logger instead of eprintln!.

Fixed. Replaced the eprintln! with a direct call to the provided callback at error level (1), so the conflict is reported through the Go structured logger instead of stderr.

Copilot · 2026-04-17T12:50:16Z

+void dd_discovery_init_logger(dd_log_fn callback);
+


The exported API dd_discovery_init_logger currently returns void, so callers cannot programmatically detect whether logger installation succeeded (e.g., if a Rust logger was already registered). Since the Rust implementation can fall back to stderr today, consider updating this declaration to return an explicit status code so Go callers can surface failures in agent logs and handle them deterministically.

The stderr fallback is gone due to the fix in the previous comment.

Copilot · 2026-04-17T12:50:16Z

+func goDiscoveryLogCallback(level C.uint32_t, msg *C.char, msgLen C.size_t) {
+	// Reject nil pointers, empty messages (nothing useful to log), and messages
+	// larger than MaxInt32 to prevent overflow when casting msgLen to C.int below.
+	if msg == nil || msgLen == 0 || msgLen > math.MaxInt32 {
+		return
+	}
+	message := C.GoStringN(msg, C.int(msgLen))
+	// Level mapping mirrors dd_log_fn in dd_discovery.h: 1=Error, 2=Warn, 3=Info, 4=Debug, 5+=Trace.


goDiscoveryLogCallback always converts the C buffer to a Go string (C.GoStringN) before checking whether the target Go log level is enabled. This incurs an allocation/copy even when the agent is configured to drop that level (e.g., Trace/Debug). Consider mapping the incoming level to pkg/util/log's LogLevel and using log.ShouldLog(...) to early-return before calling C.GoStringN.

Fixed. Added a log.ShouldLog() check before C.GoStringN so the string allocation is skipped entirely for records below the configured level.

Copilot · 2026-04-17T12:50:16Z

+impl log::Log for GoLogger {
+    fn enabled(&self, _metadata: &log::Metadata) -> bool {
+        true // All records pass through; Go-side filtering applies.
+    }
+
+    fn log(&self, record: &log::Record) {
+        let Some(&callback) = LOGGER_CALLBACK.get() else {
+            return;
+        };
+        let level: u32 = match record.level() {
+            log::Level::Error => 1,
+            log::Level::Warn => 2,
+            log::Level::Info => 3,
+            log::Level::Debug => 4,
+            log::Level::Trace => 5,
+        };
+        let message = record.args().to_string();
+        // SAFETY: `callback` was registered by the Go runtime and is valid for
+        // the process lifetime. `message` is live for the duration of this call.
+        unsafe { callback(level, message.as_ptr() as *const c_char, message.len()) };
+    }
+
+    fn flush(&self) {}
+}
+
+static GO_LOGGER: GoLogger = GoLogger;
+
+/// Register a Go logging callback. Idempotent: only the first call has effect;
+/// subsequent calls (e.g. from racing goroutines at start-up) are ignored.
+///
+/// All log levels are forwarded to the callback; the Go side applies its own
+/// level filter so no records are emitted above the configured level.
+///
+/// # Panic safety
+/// Wrapped in `catch_unwind` following the convention in this file: unwinding
+/// across a C ABI boundary is undefined behaviour per the Rust nomicon.
+///
+/// # Safety
+/// `callback` must be a valid function pointer that remains valid for the
+/// lifetime of the process.
+#[unsafe(no_mangle)]
+pub unsafe extern "C" fn dd_discovery_init_logger(callback: dd_log_fn) {
+    // SAFETY: Wrapping in catch_unwind prevents a Rust panic from unwinding
+    // across the C ABI boundary, which would be undefined behaviour.
+    let _ = panic::catch_unwind(AssertUnwindSafe(|| {
+        if LOGGER_CALLBACK.set(callback).is_err() {
+            return;
+        }
+        // `set_logger` returns Err if another logger was already registered
+        // (e.g. in a test harness). In that case LOGGER_CALLBACK is set but
+        // GoLogger is not installed, so all Rust log records would be silently
+        // dropped. Surface the conflict so it is not invisible to the caller.
+        if log::set_logger(&GO_LOGGER).is_err() {
+            eprintln!(
+                "[dd_discovery] dd_discovery_init_logger: a Rust logger was already \
+                 registered; Rust log records will not be forwarded to the Go logger"
+            );
+            return;
+        }
+        log::set_max_level(log::LevelFilter::Trace);
+    }));


dd_discovery_init_logger sets the Rust global max level to Trace and GoLogger.enabled() always returns true. This forces every Rust log call (including Trace) to cross the FFI boundary and allocate/format even when Go is configured to drop that level, which can be a significant overhead in hot paths. Consider accepting a max level from Go (or another mechanism to reflect Go’s current level) and setting log::set_max_level accordingly, and/or implementing enabled() to filter by that level before formatting/forwarding.

Fixed based on a previous comment.

- Pass the current Go log level to dd_discovery_init_logger so Rust sets log::set_max_level accordingly, discarding records below the configured threshold before formatting or crossing the FFI boundary. - Add log.ShouldLog() guard in goDiscoveryLogCallback before C.GoStringN to skip the string allocation for records that will be dropped. - Replace eprintln! with a direct callback invocation at error level when log::set_logger fails, so the conflict reaches the Go structured logger instead of being lost to stderr. Runtime propagation of log level changes is left as a follow-up. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

martavicentenavarro added changelog/no-changelog No changelog entry needed qa/done QA done before merge and regressions are covered by tests team/agent-discovery labels Apr 15, 2026

github-actions bot added the component/system-probe label Apr 15, 2026

dd-octo-sts bot added the internal Identify a non-fork PR label Apr 15, 2026

github-actions bot added the medium review PR review might take time label Apr 15, 2026

martavicentenavarro force-pushed the marta/logs_spl branch from d6d840a to 62a9617 Compare April 15, 2026 15:37

martavicentenavarro force-pushed the marta/logs_spl branch 2 times, most recently from c1ec169 to 70710a7 Compare April 16, 2026 15:29

feat(discovery): Rust-to-Go logging bridge for libdd_discovery

4da3327

martavicentenavarro force-pushed the marta/logs_spl branch from 70710a7 to 4da3327 Compare April 17, 2026 08:42

martavicentenavarro requested a review from Copilot April 17, 2026 12:41

Copilot started reviewing on behalf of martavicentenavarro April 17, 2026 12:41 View session

chatgpt-codex-connector bot reviewed Apr 17, 2026

View reviewed changes

Copilot AI reviewed Apr 17, 2026

View reviewed changes

github-actions bot added long review PR is complex, plan time to review it and removed medium review PR review might take time labels Apr 17, 2026

Conversation

martavicentenavarro commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Describe how you validated your changes

Additional Notes

Uh oh!

dd-octo-sts bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Files inventory check summary

Results for datadog-agent_7.79.0~devel.git.755.79bc052.pipeline.108251497-1_amd64.deb:

Uh oh!

cit-pr-commenter-54b7da bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Regression Detector

Regression Detector Results

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Fine details of change detection per experiment

Bounds Checks: ❌ Failed

Explanation

CI Pass/Fail Decision

Uh oh!

martavicentenavarro commented Apr 17, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

martavicentenavarro Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

martavicentenavarro Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

martavicentenavarro Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

martavicentenavarro Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

martavicentenavarro Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

martavicentenavarro commented Apr 15, 2026 •

edited

Loading

dd-octo-sts bot commented Apr 15, 2026 •

edited

Loading

cit-pr-commenter-54b7da bot commented Apr 15, 2026 •

edited

Loading