tests/zedagent: add zedagent integration test suite#1152
Open
eriknordmark wants to merge 6 commits intolf-edge:masterfrom
Open
tests/zedagent: add zedagent integration test suite#1152eriknordmark wants to merge 6 commits intolf-edge:masterfrom
eriknordmark wants to merge 6 commits intolf-edge:masterfrom
Conversation
645a898 to
63c40a8
Compare
Adds an Eden testscript suite (tests/zedagent/) that exercises the zedagent microservice end-to-end against a live EVE instance. Six test scenarios: - device_info_completeness: verifies ZInfoDevice contains hardware inventory, network adapters, EVE version, and data-security-at-rest info, and that config-item changes appear in the next publish. - config_items_and_status: exercises the config-item round-trip through parseConfigItems/handleGlobalConfigImpl; deploys and deletes an app to cover parseAppInstanceConfig. - maintenance_mode: sets maintenance.mode=enabled, confirms ZInfoDevice.state transitions to ZDEVICE_STATE_MAINTENANCE_MODE, then restores normal operation. - app_metrics_detail: deploys an app with a persistent volume, verifies per-app metrics (ZMetricMsg.am) and per-device disk metrics (MetricContent.dm.disk), and confirms ZiApp.state:RUNNING is published. - network_instance_info_metrics: creates a local NI, deploys two apps on it, verifies NI info (ZiNetworkInstance.networkID) and NI metrics (ZMetricMsg.nm.networkID). - attest_flow: verifies the remote attestation FSM reaches ATTEST_STATE_COMPLETE, PCR status is published, and the integrity token is persisted (requires eve.tpm=true; skipped otherwise). zedagent_test.go provides TestInfo, TestMetric, and TestFlowLog helpers that the testscripts invoke via the test command. Every test guards against unexpected device reboots using the standard watchdog pattern: ! test eden.reboot.test -test.v -timewait=<N>m -reboot=0 -count=1 & with timewait sized to exceed the worst-case foreground duration of each test. A pkill cancels the watchdog immediately after the real work is done so that wait returns promptly rather than blocking until the ceiling. Hard crashes that trigger a reboot (kernel panic, OOM reboot, watchdog timeout) are caught by the reboot guard. Soft crashes (process restart without reboot) are caught indirectly: foreground TestInfo/TestApp steps will time out if zedagent misses a publish after restarting. The suite was validated against a QEMU-based coverage-instrumented EVE instance. The Eden e2e run achieves 50.6% statement coverage on cmd/zedagent, versus 10.4% from the existing unit tests. Signed-off-by: eriknordmark <erik@zededa.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
32a6659 to
57f210d
Compare
Replace inline `pkill -f` with an embedded kill_watchdog.sh script in all six test files. The old `exec sh -c 'pkill -f ...'` pattern caused pkill to match and kill its own parent sh process before `|| true` could suppress the error, resulting in [signal: terminated]. The embedded script avoids the self-match: the `sh kill_watchdog.sh` process has no `eden.reboot.test` in its cmdline, and pgrep excludes itself. Also fix the attest_flow skip message: testscript `skip` accepts a single bare-word argument, not a quoted string with spaces. Signed-off-by: eriknordmark <erik@zededa.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three fixes for the zedagent integration tests: device_info_completeness: merge the swList.shortVersion check into the same TestInfo call as machineArch/systemAdapter/dataSecAtRestInfo. The original split into two sequential TestInfo calls had a race where the epoch-bump ZInfoDevice arrived before the second subscriber started, causing a 5m timeout. maintenance_mode: increase the exit-maintenance-mode TestInfo timeout from 5m to 10m. EVE can take longer than 5m to re-populate systemAdapter fields after clearing maintenance mode. config_items_and_status: add pre-test cleanup (pre_cleanup.sh) to remove any zagent-t1/zagent-n1 resources left by a previous failed run, wait for them to be fully absent, then proceed. Also increase the AppInfo TestInfo timeout from 5m to 10m for the same epoch-race reason. Remove the brittle `stdout 'deploy network ... request sent'` check that breaks when the network already exists. Signed-off-by: eriknordmark <erik@zededa.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Testscript does not support shell-style backslash line continuations. Remove them from device_info_completeness and attest_flow, putting each command on a single line. For the maintenance_mode exit check, replace the systemAdapter.status.ports.ifname filter with machineArch. After exiting maintenance mode EVE consistently re-populates machineArch (a static field present in every ZInfoDevice) before it re-populates systemAdapter, so this filter reliably catches the first ZInfoDevice published on exit. Also extend the timeout from 10m to 15m. Signed-off-by: eriknordmark <erik@zededa.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pod deletion in EVE can take more than 3m (container teardown, volume cleanup). Increase the pre-cleanup wait for zagent-t1 absence from 3m to 10m so that a stale pod from a previous failed run is fully removed before this test re-creates it. Signed-off-by: eriknordmark <erik@zededa.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
device_info_completeness: TestInfo with multiple -out flags only outputs the last field's value. Use a single -out for swList.shortVersion (the meaningful stdout check) and verify machineArch via the filter predicate only. Increase timeout to 15m since this consistently takes ~10m. maintenance_mode: a previous failed run can leave EVE stuck in maintenance mode. Add a pre-cleanup step that resets maintenance.mode to none and waits for a confirming ZInfoDevice before starting the actual test, ensuring we always begin from a known ONLINE state. Bump watchdog to 35m to cover the extra pre-cleanup window. Signed-off-by: eriknordmark <erik@zededa.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an Eden testscript suite (
tests/zedagent/) that exercises thezedagent microservice end-to-end against a live EVE instance.
Six test scenarios:
ZInfoDevicecontainshardware inventory, network adapters, EVE version, and
data-security-at-rest info; also verifies that a config-item change
appears in the next publish.
through
parseConfigItems/handleGlobalConfigImpl; deploys anddeletes an app to cover
parseAppInstanceConfig.maintenance.mode=enabled, confirmsZInfoDevice.statetransitions toZDEVICE_STATE_MAINTENANCE_MODE,then restores normal operation.
verifies per-app metrics (
ZMetricMsg.am) and per-device diskmetrics (
MetricContent.dm.disk), and confirmsZiApp.state:RUNNINGis published.
apps on it, verifies NI info (
ZiNetworkInstance.networkID) and NImetrics (
ZMetricMsg.nm.networkID).ATTEST_STATE_COMPLETE, PCR status is published, and the integritytoken is persisted. Requires
eve.tpm=true; skipped otherwise.zedagent_test.goprovidesTestInfo,TestMetric, andTestFlowLoghelpers that the testscripts invoke via the
testcommand.Every test guards against unexpected device reboots using the standard
watchdog pattern:
with
timewaitsized to exceed the worst-case foreground duration ofeach test. A
pkillcancels the watchdog as soon as the real work isdone so that
waitreturns promptly. Hard crashes that trigger a reboot(kernel panic, OOM reboot, watchdog timeout) are caught by the reboot
guard. Soft crashes (process restart without reboot) are caught
indirectly: foreground
TestInfo/TestAppsteps time out if zedagentmisses a publish after restarting.
The suite was validated against a QEMU-based coverage-instrumented EVE
instance. The Eden e2e run achieves 50.6% statement coverage on
cmd/zedagent, versus 10.4% from the existing unit tests.Test plan
TestEdenScripts/*subtests pass (attest_flowrequires a TPM-enabled device; it self-skips on plain QEMU).
🤖 Generated with Claude Code