Retry pending report_status instead of forcing re-download#251
Retry pending report_status instead of forcing re-download#251MichaelUray wants to merge 1 commit intoopenwisp:masterfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a persistent pending-report mechanism via a new PENDING_REPORT file to store the last report_status payload before reporting. update_configuration() sets a status_to_report ("applied" or "error"), writes it to PENDING_REPORT, then calls report_status with retry; on success it removes PENDING_REPORT, on failure the marker remains. Introduces retry_pending_report() which validates ("applied" or "error"), retries reporting the stored status, and deletes PENDING_REPORT only on success. The main loop calls retry_pending_report() when no configuration update occurs but PENDING_REPORT exists. Sequence DiagramsequenceDiagram
actor Agent
participant Config as Config System
participant Report as Report Service
participant File as PENDING_REPORT File
loop Main Loop
Agent->>Config: check for updates / update_configuration()
alt Configuration applied or error
Config-->>Agent: status (applied / error)
Agent->>File: write status_to_report (persist)
Agent->>Report: report_status(status_to_report) with retry
alt Report succeeds
Report-->>Agent: success
Agent->>File: delete PENDING_REPORT
else Report fails
Report-->>Agent: failure
Note over Agent,File: keep PENDING_REPORT for retry
end
else No update available
Agent->>File: check PENDING_REPORT exists?
alt Exists
Agent->>File: read pending status
Agent->>Report: retry report_status(pending status) with retry
alt Report succeeds
Report-->>Agent: success
Agent->>File: delete PENDING_REPORT
else Report fails
Report-->>Agent: failure
Note over Agent,File: keep PENDING_REPORT for next loop
end
end
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Caution Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional.
❌ Failed checks (1 error, 1 warning, 1 inconclusive)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Code Review SummaryStatus: No Issues Found | Recommendation: Merge The implementation has been significantly simplified from the previous approach: Changes in this revision:
Key behavior:
The simplified approach trades the "retry without re-download" optimization for code clarity and maintainability. Files Reviewed (2 files)
Reviewed by kimi-k2.5-0127 · 107,055 tokens |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
openwisp-config/files/openwisp.agent (1)
829-853:⚠️ Potential issue | 🟠 MajorPersist
PENDING_REPORTbefore the first retry attempt.Line 837 commits the new checksum before this branch writes
PENDING_REPORT. If the daemon is restarted or crashes anywhere insideretry_with_backoff report_status ..., the delayed status is lost and the controller can stay stale until the next config change. Write the marker first, then delete it only afterreport_statussucceeds.Suggested fix
if [ "$result" -eq "0" ]; then logger "Configuration applied successfully" \ -t openwisp \ -p daemon.info # send config-applied hotplug event env -i ACTION="config-applied" /sbin/hotplug-call openwisp # store the new checksum as last known checksum cp "$CONFIGURATION_CHECKSUM" "$PERSISTENT_CHECKSUM" status_to_report="applied" - retry_with_backoff report_status "applied" else status_to_report="error" - retry_with_backoff report_status "error" fi + printf '%s\n' "$status_to_report" >"$PENDING_REPORT" + retry_with_backoff report_status "$status_to_report" # if reporting of the status fails, save the pending status for retry # in the next cycle without forcing a full re-download # shellcheck disable=SC2181 if [ "$?" -ne "0" ]; then - echo "$status_to_report" >"$PENDING_REPORT" logger -s "report_status failed, will retry in the next cycle" \ -t openwisp \ -p daemon.warning else rm -f "$PENDING_REPORT" fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openwisp-config/files/openwisp.agent` around lines 829 - 853, Move persistence of the pending status before any retry attempt and only commit the new checksum after a successful report: in the block using status_to_report, write the status into PENDING_REPORT (echo "$status_to_report" >"$PENDING_REPORT") before calling retry_with_backoff report_status ... (for both the "applied" and "error" branches), then call retry_with_backoff report_status "$status_to_report"; if it succeeds, rm -f "$PENDING_REPORT" and only then cp "$CONFIGURATION_CHECKSUM" "$PERSISTENT_CHECKSUM" (for the "applied" path), otherwise leave PENDING_REPORT in place for the next cycle. Ensure you reference PENDING_REPORT, retry_with_backoff, report_status, CONFIGURATION_CHECKSUM and PERSISTENT_CHECKSUM when making changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@openwisp-config/files/openwisp.agent`:
- Around line 1004-1005: The script uses "local pending_status" at top-level
inside the main "while true" loop, which causes "local: not in a function" in
ash; change it to a regular variable assignment by removing the "local" keyword
where "pending_status" is set from "$PENDING_REPORT" (the assignment that
currently reads "local pending_status" and "pending_status=$(cat
\"$PENDING_REPORT\")") so the variable is assigned normally in the loop scope.
---
Outside diff comments:
In `@openwisp-config/files/openwisp.agent`:
- Around line 829-853: Move persistence of the pending status before any retry
attempt and only commit the new checksum after a successful report: in the block
using status_to_report, write the status into PENDING_REPORT (echo
"$status_to_report" >"$PENDING_REPORT") before calling retry_with_backoff
report_status ... (for both the "applied" and "error" branches), then call
retry_with_backoff report_status "$status_to_report"; if it succeeds, rm -f
"$PENDING_REPORT" and only then cp "$CONFIGURATION_CHECKSUM"
"$PERSISTENT_CHECKSUM" (for the "applied" path), otherwise leave PENDING_REPORT
in place for the next cycle. Ensure you reference PENDING_REPORT,
retry_with_backoff, report_status, CONFIGURATION_CHECKSUM and
PERSISTENT_CHECKSUM when making changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ebefb09a-1e67-4c5b-877a-5b4a46d98822
📒 Files selected for processing (1)
openwisp-config/files/openwisp.agent
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Kilo Code Review
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-02-27T07:54:22.974Z
Learnt from: Viscous106
Repo: openwisp/openwisp-config PR: 248
File: openwisp-config/files/openwisp.init:95-96
Timestamp: 2026-02-27T07:54:22.974Z
Learning: In openwisp-config/files/openwisp.init, the wait_applying_config() function intentionally removes the CONTROL_FILE (/tmp/openwisp/applying_conf) after a 30-second timeout via `rm -f "$CONTROL_FILE"`. This timeout serves as an escape hatch to prevent permanent deadlock when the agent crashes and leaves a stale lock file behind. The design assumes configuration apply operations complete within 30 seconds under normal conditions.
Applied to files:
openwisp-config/files/openwisp.agent
🔇 Additional comments (1)
openwisp-config/files/openwisp.agent (1)
203-203: Good separation of pending-report state.Using a dedicated marker file here keeps status-delivery retries independent from checksum drift detection.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@openwisp-config/files/openwisp.agent`:
- Around line 990-1000: The retry_pending_report() function currently reads
PENDING_REPORT into pending_status and calls retry_with_backoff report_status
"$pending_status" without validation; update retry_pending_report to first
verify PENDING_REPORT exists and that pending_status is non-empty and matches
the expected format/value (e.g., a known status token or numeric ID) after
trimming whitespace, log a clear error and remove or archive the malformed
PENDING_REPORT instead of retrying when invalid, and only call
retry_with_backoff report_status "$pending_status" when the validation passes;
reference the retry_pending_report function, PENDING_REPORT variable,
pending_status variable, and the retry_with_backoff/report_status call when
making the changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a58ee842-5f98-4e8c-bd39-3c8da73957d9
📒 Files selected for processing (1)
openwisp-config/files/openwisp.agent
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Kilo Code Review
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-02-27T07:54:22.974Z
Learnt from: Viscous106
Repo: openwisp/openwisp-config PR: 248
File: openwisp-config/files/openwisp.init:95-96
Timestamp: 2026-02-27T07:54:22.974Z
Learning: In openwisp-config/files/openwisp.init, the wait_applying_config() function intentionally removes the CONTROL_FILE (/tmp/openwisp/applying_conf) after a 30-second timeout via `rm -f "$CONTROL_FILE"`. This timeout serves as an escape hatch to prevent permanent deadlock when the agent crashes and leaves a stale lock file behind. The design assumes configuration apply operations complete within 30 seconds under normal conditions.
Applied to files:
openwisp-config/files/openwisp.agent
🔇 Additional comments (3)
openwisp-config/files/openwisp.agent (3)
203-203: Good addition of a dedicated pending-report marker path.This cleanly introduces the state file without changing existing path conventions.
829-854: Failure-path redesign is correct and aligns with the PR goal.Persisting pending status instead of removing checksums is the right behavior for transient
report_statusfailures.
1013-1017: Main-loop integration is sound.Retrying pending status only when no update is triggered keeps update precedence intact and avoids unnecessary re-downloads.
…wisp#251 When report_status fails after a configuration apply (e.g. due to a transient HTTP 502 from the controller), the agent previously deleted the local checksum files to trigger a full re-download on the next cycle. That worked but wasted bandwidth and caused the controller to stay stuck on "modified" until the next real config change. Instead persist the unreported status in a pending_report marker file and retry just report_status on the next polling cycle. Key details addressed from the review: - Define retry_pending_report as a shell function so "local" is valid on BusyBox ash, where "local" outside a function is a parse error that silently crashes the script mid-loop. - Persist the marker BEFORE calling retry_with_backoff report_status so a crash or reboot between the retry attempt and the PENDING_REPORT write does not lose the status. - Validate the marker contents against the known-good status set (applied/error/running) and discard a malformed marker rather than retrying forever. - Only remove the marker after a successful report_status. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
60c2ea2 to
46e3c54
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@openwisp-config/files/openwisp.agent`:
- Around line 1000-1009: The case validation for pending_status currently
accepts "running" but update_configuration() only writes "applied" or "error";
remove "running" from the allowed list in the case pattern (change
applied|error|running to applied|error) so validation matches actual usage in
update_configuration(); if "running" must be supported for
controller-side/forward-compatibility reasons, instead leave it and add a
one-line comment above the case explaining why "running" is valid and where it
may be produced (reference update_configuration() and the controller) so future
maintainers understand the intent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a193c14c-73f9-41f9-9156-16eecd9ef685
📒 Files selected for processing (1)
openwisp-config/files/openwisp.agent
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: QA-Checks and Tests
- GitHub Check: Kilo Code Review
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-02-27T07:54:22.974Z
Learnt from: Viscous106
Repo: openwisp/openwisp-config PR: 248
File: openwisp-config/files/openwisp.init:95-96
Timestamp: 2026-02-27T07:54:22.974Z
Learning: In openwisp-config/files/openwisp.init, the wait_applying_config() function intentionally removes the CONTROL_FILE (/tmp/openwisp/applying_conf) after a 30-second timeout via `rm -f "$CONTROL_FILE"`. This timeout serves as an escape hatch to prevent permanent deadlock when the agent crashes and leaves a stale lock file behind. The design assumes configuration apply operations complete within 30 seconds under normal conditions.
Applied to files:
openwisp-config/files/openwisp.agent
🔇 Additional comments (3)
openwisp-config/files/openwisp.agent (3)
203-203: LGTM!The new
PENDING_REPORTmarker file is appropriately placed in the volatile/tmp/openwispdirectory, ensuring it's cleared on reboot and doesn't introduce persistent state. This aligns well with the existing working file conventions.
829-856: Well-designed crash-safe implementation.The approach of writing the pending status to
PENDING_REPORTbefore attemptingreport_statusensures that if the agent crashes mid-retry, the marker persists and the status can be retried on the next cycle. Removing the marker only on success is correct.This properly addresses the original problem by no longer deleting checksum files on
report_statusfailure.
1030-1034: LGTM!The main loop integration correctly prioritizes configuration updates over pending report retries. The inline comments clearly explain the retry scenario, making the code self-documenting.
…wisp#251 When report_status fails after a configuration apply (e.g. due to a transient HTTP 502 from the controller), the agent previously deleted the local checksum files to trigger a full re-download on the next cycle. That worked but wasted bandwidth and caused the controller to stay stuck on "modified" until the next real config change. Instead persist the unreported status in a pending_report marker file and retry just report_status on the next polling cycle. Key details addressed from the review: - Define retry_pending_report as a shell function so "local" is valid on BusyBox ash, where "local" outside a function is a parse error that silently crashes the script mid-loop. - Persist the marker BEFORE calling retry_with_backoff report_status so a crash or reboot between the retry attempt and the PENDING_REPORT write does not lose the status. - Validate the marker contents against the known-good status set (applied/error/running) and discard a malformed marker rather than retrying forever. - Only remove the marker after a successful report_status. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
46e3c54 to
6863fc6
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
openwisp-config/files/openwisp.agent (2)
1027-1035:⚠️ Potential issue | 🟠 MajorOnly retry a pending report after a confirmed no-change result.
This
elifruns for every non-1return fromconfiguration_changed(), including checksum fetch failures. In that path the agent has not proved the controller still wants the same config, so replaying a delayedapplied/errorcan acknowledge stale state and adds another full backoff loop during outages. Gateretry_pending_report()on exit code0only.Suggested fix
configuration_changed + config_change_status=$? - if [ "$?" -eq "1" ]; then + if [ "$config_change_status" -eq "1" ]; then update_configuration - elif [ -f "$PENDING_REPORT" ]; then + elif [ "$config_change_status" -eq "0" ] && [ -f "$PENDING_REPORT" ]; then # Retry a previously failed report_status without re-downloading. # This handles the case where the config was applied successfully # but report_status failed due to a transient server error. retry_pending_report fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openwisp-config/files/openwisp.agent` around lines 1027 - 1035, The script currently uses "$?" to decide branches after calling configuration_changed(), which can be overwritten and treats any non-1 result as eligible to retry a pending report; capture the exit code immediately (e.g., cfg_rc=$(configuration_changed; echo $?) or store $? into a variable right after calling configuration_changed()) and change the elif to only call retry_pending_report when that captured code equals 0 and the pending file exists (use the symbols configuration_changed, update_configuration, retry_pending_report, and PENDING_REPORT to locate the logic); leave the update_configuration branch for cfg_rc == 1.
829-856:⚠️ Potential issue | 🟠 MajorFail closed if
PENDING_REPORTcannot be persisted.This retry path only works if the marker is actually saved. Right now a tmpfs/full-disk or redirection failure is ignored, so a later
report_statusfailure silently drops the only retry state and reintroduces the stuck-controller case this change is meant to avoid. Persist the marker atomically and fall back to the old checksum invalidation path when that write cannot be saved.Suggested fix
- local status_to_report + local status_to_report pending_report_saved=0 pending_report_tmp if [ "$result" -eq "0" ]; then logger "Configuration applied successfully" \ -t openwisp \ @@ - printf '%s\n' "$status_to_report" >"$PENDING_REPORT" + pending_report_tmp="${PENDING_REPORT}.tmp" + if printf '%s\n' "$status_to_report" >"$pending_report_tmp" \ + && mv -f "$pending_report_tmp" "$PENDING_REPORT"; then + pending_report_saved=1 + else + rm -f "$pending_report_tmp" + logger -s "Failed to persist pending report marker" \ + -t openwisp \ + -p daemon.err + fi retry_with_backoff report_status "$status_to_report" # shellcheck disable=SC2181 if [ "$?" -eq "0" ]; then - rm -f "$PENDING_REPORT" + [ "$pending_report_saved" -eq "1" ] && rm -f "$PENDING_REPORT" else + if [ "$pending_report_saved" -ne "1" ]; then + rm -f "$CONFIGURATION_CHECKSUM" "$PERSISTENT_CHECKSUM" + fi logger -s "report_status failed, will retry in the next cycle" \ -t openwisp \ -p daemon.warning fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openwisp-config/files/openwisp.agent` around lines 829 - 856, The script currently writes the retry marker directly to $PENDING_REPORT without guaranteeing persistence; if that write fails the retry path is lost. Change the write to be atomic: write the marker to a temporary file (use mktemp or "$PENDING_REPORT.tmp"), fsync/ensure the write succeeded, then mv the temp into place to replace $PENDING_REPORT and check the mv exit status; if any of those steps fail, log the failure (use logger -t openwisp -p daemon.err) and "fail closed" by removing/invalidating the stored checksum (remove or unlink $PERSISTENT_CHECKSUM or otherwise revert the copy from $CONFIGURATION_CHECKSUM) so the old checksum invalidation path will trigger, then skip the retry_with_backoff report_status path (or treat it as a permanent failure) instead of silently proceeding.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@openwisp-config/files/openwisp.agent`:
- Around line 1027-1035: The script currently uses "$?" to decide branches after
calling configuration_changed(), which can be overwritten and treats any non-1
result as eligible to retry a pending report; capture the exit code immediately
(e.g., cfg_rc=$(configuration_changed; echo $?) or store $? into a variable
right after calling configuration_changed()) and change the elif to only call
retry_pending_report when that captured code equals 0 and the pending file
exists (use the symbols configuration_changed, update_configuration,
retry_pending_report, and PENDING_REPORT to locate the logic); leave the
update_configuration branch for cfg_rc == 1.
- Around line 829-856: The script currently writes the retry marker directly to
$PENDING_REPORT without guaranteeing persistence; if that write fails the retry
path is lost. Change the write to be atomic: write the marker to a temporary
file (use mktemp or "$PENDING_REPORT.tmp"), fsync/ensure the write succeeded,
then mv the temp into place to replace $PENDING_REPORT and check the mv exit
status; if any of those steps fail, log the failure (use logger -t openwisp -p
daemon.err) and "fail closed" by removing/invalidating the stored checksum
(remove or unlink $PERSISTENT_CHECKSUM or otherwise revert the copy from
$CONFIGURATION_CHECKSUM) so the old checksum invalidation path will trigger,
then skip the retry_with_backoff report_status path (or treat it as a permanent
failure) instead of silently proceeding.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 24ff03ed-fee7-4ead-aa12-c4b63b1d3f1f
📒 Files selected for processing (1)
openwisp-config/files/openwisp.agent
📜 Review details
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-02-27T07:54:22.974Z
Learnt from: Viscous106
Repo: openwisp/openwisp-config PR: 248
File: openwisp-config/files/openwisp.init:95-96
Timestamp: 2026-02-27T07:54:22.974Z
Learning: In openwisp-config/files/openwisp.init, the wait_applying_config() function intentionally removes the CONTROL_FILE (/tmp/openwisp/applying_conf) after a 30-second timeout via `rm -f "$CONTROL_FILE"`. This timeout serves as an escape hatch to prevent permanent deadlock when the agent crashes and leaves a stale lock file behind. The design assumes configuration apply operations complete within 30 seconds under normal conditions.
Applied to files:
openwisp-config/files/openwisp.agent
…wisp#251 When report_status fails after a configuration apply (e.g. due to a transient HTTP 502 from the controller), the agent previously deleted the local checksum files to trigger a full re-download on the next cycle. That worked but wasted bandwidth and caused the controller to stay stuck on "modified" until the next real config change. Instead persist the unreported status in a pending_report marker file and retry just report_status on the next polling cycle. Key details addressed from the review: - Define retry_pending_report as a shell function so "local" is valid on BusyBox ash, where "local" outside a function is a parse error that silently crashes the script mid-loop. - Persist the marker BEFORE calling retry_with_backoff report_status so a crash or reboot between the retry attempt and the PENDING_REPORT write does not lose the status. - Validate the marker contents against the known-good status set (applied/error) and discard a malformed marker rather than retrying forever. - Only remove the marker after a successful report_status. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6863fc6 to
d1eeb7f
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
openwisp-config/files/openwisp.agent (1)
829-848:⚠️ Potential issue | 🟠 MajorPersist the original error payload, not just
"error".Lines 843-848 only freeze the status token. On the
"error"path,report_status()rebuildserror_reasonfrom current logs on every call, so a deferred retry can send a different or empty payload after log rotation or later-cycle noise. Please persist the originalerror_reasonsnapshot together with the status and reuse that inretry_pending_report().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openwisp-config/files/openwisp.agent` around lines 829 - 848, When the apply path fails, capture the current error payload (the computed error_reason string that report_status would send) and persist it together with the status token instead of writing only "error" to PENDING_REPORT; update the failure branch that sets status_to_report="error" to also write the serialized error_reason snapshot to PENDING_REPORT (or a companion file) and ensure retry_pending_report() / report_status() read and reuse that stored snapshot rather than reconstructing error_reason from current logs; keep the existing behavior to remove the persisted marker only after report_status succeeds and ensure retry_with_backoff still invokes report_status with the preserved payload.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CHANGELOG.rst`:
- Around line 10-12: The CHANGELOG.rst bullet "Fixed retry pending
``report_status`` instead of forcing re-download after a transient HTTP error."
is miswrapped and reads awkwardly; rewrap this entry to match the surrounding
RST bullet/link style and rephrase to clarify the action (for example: "Mark
retry as pending in ``report_status`` instead of forcing a re-download after
transient HTTP errors. `#251
<https://github.com/openwisp/openwisp-config/pull/251>`_"). Ensure line wrapping
follows the file's existing RST width/formatting and that the inline code token
``report_status`` and the PR link use the same style as other entries.
---
Outside diff comments:
In `@openwisp-config/files/openwisp.agent`:
- Around line 829-848: When the apply path fails, capture the current error
payload (the computed error_reason string that report_status would send) and
persist it together with the status token instead of writing only "error" to
PENDING_REPORT; update the failure branch that sets status_to_report="error" to
also write the serialized error_reason snapshot to PENDING_REPORT (or a
companion file) and ensure retry_pending_report() / report_status() read and
reuse that stored snapshot rather than reconstructing error_reason from current
logs; keep the existing behavior to remove the persisted marker only after
report_status succeeds and ensure retry_with_backoff still invokes report_status
with the preserved payload.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 490bde50-eb13-4e22-8086-bc3cf748c5ff
📒 Files selected for processing (2)
CHANGELOG.rstopenwisp-config/files/openwisp.agent
📜 Review details
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-02-27T07:54:22.974Z
Learnt from: Viscous106
Repo: openwisp/openwisp-config PR: 248
File: openwisp-config/files/openwisp.init:95-96
Timestamp: 2026-02-27T07:54:22.974Z
Learning: In openwisp-config/files/openwisp.init, the wait_applying_config() function intentionally removes the CONTROL_FILE (/tmp/openwisp/applying_conf) after a 30-second timeout via `rm -f "$CONTROL_FILE"`. This timeout serves as an escape hatch to prevent permanent deadlock when the agent crashes and leaves a stale lock file behind. The design assumes configuration apply operations complete within 30 seconds under normal conditions.
Applied to files:
openwisp-config/files/openwisp.agent
🪛 GitHub Actions: OpenWISP Config CI Build
CHANGELOG.rst
[error] 1-1: ReStructuredText check failed: File '/home/runner/work/openwisp-config/openwisp-config/CHANGELOG.rst' could be reformatted (1 out of 13 files could be reformatted).
🔇 Additional comments (3)
openwisp-config/files/openwisp.agent (3)
203-203: Good placement for transient retry state.Keeping
PENDING_REPORTunder$WORKING_DIRpreserves the existing/tmp/openwisplifecycle and avoids introducing new persistent state.
992-1019: Nice defensive handling of malformed retry markers.Validating
pending_statusand discarding bad content prevents the agent from spinning forever on a corrupt marker.
1031-1035: Good precedence in the polling loop.Only retrying
PENDING_REPORTwhen no new configuration is available keeps real config updates ahead of stale status cleanup.
…wisp#251 When report_status fails after a configuration apply (e.g. due to a transient HTTP 502 from the controller), the agent previously deleted the local checksum files to trigger a full re-download on the next cycle. That worked but wasted bandwidth and caused the controller to stay stuck on "modified" until the next real config change. Instead persist the unreported status in a pending_report marker file and retry just report_status on the next polling cycle. Key details addressed from the review: - Define retry_pending_report as a shell function so "local" is valid on BusyBox ash, where "local" outside a function is a parse error that silently crashes the script mid-loop. - Persist the marker BEFORE calling retry_with_backoff report_status so a crash or reboot between the retry attempt and the PENDING_REPORT write does not lose the status. - Validate the marker contents against the known-good status set (applied/error) and discard a malformed marker rather than retrying forever. - Only remove the marker after a successful report_status. Closes openwisp#251 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
d1eeb7f to
40564c7
Compare
…wisp#251 When report_status fails after a configuration apply (e.g. due to a transient HTTP 502 from the controller), the agent previously deleted the local checksum files to trigger a full re-download on the next cycle. That worked but wasted bandwidth and caused the controller to stay stuck on "modified" until the next real config change. Instead persist the unreported status in a pending_report marker file and retry just report_status on the next polling cycle. Key details addressed from the review: - Define retry_pending_report as a shell function so "local" is valid on BusyBox ash, where "local" outside a function is a parse error that silently crashes the script mid-loop. - Persist the marker BEFORE calling retry_with_backoff report_status so a crash or reboot between the retry attempt and the PENDING_REPORT write does not lose the status. - Validate the marker contents against the known-good status set (applied/error) and discard a malformed marker rather than retrying forever. - Only remove the marker after a successful report_status. Closes openwisp#251 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
40564c7 to
bb95a91
Compare
…wisp#251 When report_status fails after a configuration apply (e.g. due to a transient HTTP 502 from the controller), the agent previously deleted the local checksum files to trigger a full re-download on the next cycle. That worked but wasted bandwidth and caused the controller to stay stuck on "modified" until the next real config change. Instead persist the unreported status in a pending_report marker file and retry just report_status on the next polling cycle. Key details addressed from the review: - Define retry_pending_report as a shell function so "local" is valid on BusyBox ash, where "local" outside a function is a parse error that silently crashes the script mid-loop. - Persist the marker BEFORE calling retry_with_backoff report_status so a crash or reboot between the retry attempt and the PENDING_REPORT write does not lose the status. - Validate the marker contents against the known-good status set (applied/error) and discard a malformed marker rather than retrying forever. - Only remove the marker after a successful report_status. Closes openwisp#251 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bb95a91 to
151b73c
Compare
…wisp#251 When report_status fails after a configuration apply (e.g. due to a transient HTTP 502 from the controller), the agent previously deleted the local checksum files to trigger a full re-download on the next cycle. That worked but wasted bandwidth and caused the controller to stay stuck on "modified" until the next real config change. Instead persist the unreported status in a pending_report marker file and retry just report_status on the next polling cycle. Key details addressed from the review: - Define retry_pending_report as a shell function so "local" is valid on BusyBox ash, where "local" outside a function is a parse error that silently crashes the script mid-loop. - Persist the marker BEFORE calling retry_with_backoff report_status so a crash or reboot between the retry attempt and the PENDING_REPORT write does not lose the status. - Validate the marker contents against the known-good status set (applied/error) and discard a malformed marker rather than retrying forever. - Only remove the marker after a successful report_status. Closes openwisp#251 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Write PENDING_REPORT atomically via tmp file + rename so a partial write (e.g. full disk) cannot leave a truncated marker behind. - Fall back to invalidating the persistent checksum when the pending marker cannot be saved, preserving the previous force re-download behaviour as a safety net. - Only retry a pending report after configuration_changed() returns 0 (confirmed no-change). Non-0/non-1 exit codes indicate checksum fetch failures and the cached applied/error status may no longer reflect the controller's current desired configuration.
151b73c to
f28f67f
Compare
…wisp#251 When report_status fails after a configuration apply (e.g. due to a transient HTTP 502 from the controller), the agent previously deleted the local checksum files to trigger a full re-download on the next cycle. That worked but wasted bandwidth and caused the controller to stay stuck on "modified" until the next real config change. Instead persist the unreported status in a pending_report marker file and retry just report_status on the next polling cycle. Key details addressed from the review: - Define retry_pending_report as a shell function so "local" is valid on BusyBox ash, where "local" outside a function is a parse error that silently crashes the script mid-loop. - Persist the marker BEFORE calling retry_with_backoff report_status so a crash or reboot between the retry attempt and the PENDING_REPORT write does not lose the status. - Validate the marker contents against the known-good status set (applied/error) and discard a malformed marker rather than retrying forever. - Only remove the marker after a successful report_status. Closes openwisp#251 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Write PENDING_REPORT atomically via tmp file + rename so a partial write (e.g. full disk) cannot leave a truncated marker behind. - Fall back to invalidating the persistent checksum when the pending marker cannot be saved, preserving the previous force re-download behaviour as a safety net. - Only retry a pending report after configuration_changed() returns 0 (confirmed no-change). Non-0/non-1 exit codes indicate checksum fetch failures and the cached applied/error status may no longer reflect the controller's current desired configuration.
f28f67f to
7ea134d
Compare
…wisp#251 When report_status fails after a configuration apply (e.g. due to a transient HTTP 502 from the controller), the agent previously deleted the local checksum files to trigger a full re-download on the next cycle. That worked but wasted bandwidth and caused the controller to stay stuck on "modified" until the next real config change. Instead persist the unreported status in a pending_report marker file and retry just report_status on the next polling cycle. Key details addressed from the review: - Define retry_pending_report as a shell function so "local" is valid on BusyBox ash, where "local" outside a function is a parse error that silently crashes the script mid-loop. - Persist the marker BEFORE calling retry_with_backoff report_status so a crash or reboot between the retry attempt and the PENDING_REPORT write does not lose the status. - Validate the marker contents against the known-good status set (applied/error) and discard a malformed marker rather than retrying forever. - Only remove the marker after a successful report_status. Closes openwisp#251 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Write PENDING_REPORT atomically via tmp file + rename so a partial write (e.g. full disk) cannot leave a truncated marker behind. - Fall back to invalidating the persistent checksum when the pending marker cannot be saved, preserving the previous force re-download behaviour as a safety net. - Only retry a pending report after configuration_changed() returns 0 (confirmed no-change). Non-0/non-1 exit codes indicate checksum fetch failures and the cached applied/error status may no longer reflect the controller's current desired configuration.
7ea134d to
089cdcd
Compare
…wisp#251 When report_status fails after a configuration apply (e.g. due to a transient HTTP 502 from the controller), the agent previously deleted the local checksum files to trigger a full re-download on the next cycle. That worked but wasted bandwidth and caused the controller to stay stuck on "modified" until the next real config change. Instead persist the unreported status in a pending_report marker file and retry just report_status on the next polling cycle. Key details: - Define retry_pending_report as a shell function so "local" is valid on BusyBox ash, where "local" outside a function is a parse error that silently crashes the script mid-loop. - Write the PENDING_REPORT marker atomically via a tmp file + rename so a partial write (e.g. full disk) cannot leave a truncated marker behind. - Persist the marker BEFORE calling retry_with_backoff report_status so a crash or reboot between the retry attempt and the PENDING_REPORT write does not lose the status. - Fall back to invalidating the persistent checksum when the pending marker cannot be saved, preserving the previous force re-download behaviour as a safety net. - Validate the marker contents against the known-good status set (applied/error) and discard a malformed marker rather than retrying forever. - Only retry a pending report after configuration_changed() returns 0 (confirmed no-change). Non-0/non-1 exit codes indicate checksum fetch failures and the cached applied/error status may no longer reflect the controller's current desired configuration. - Only remove the marker after a successful report_status. Closes openwisp#251
089cdcd to
34e35fa
Compare
|
@coderabbitai review All previously requested changes have been addressed in commits |
|
Only users with a collaborator, contributor, member, or owner role can interact with CodeRabbit. |
Summary
When
report_statusfails due to a transient server error (e.g., HTTP 502 during server maintenance), the agent deletes its local checksums, forcing a full re-download and re-apply on the next cycle. This is wasteful since the configuration was already applied successfully.Problem
report_status("applied")fails (server temporarily unavailable)Worse: if the server recovers but the agent has already re-downloaded and the checksums now match, it skips the report entirely, leaving the controller stuck in "modified" status indefinitely.
Fix
Instead of deleting checksums on report failure, save the pending status to a marker file (
/tmp/openwisp/pending_report). On the next polling cycle, if no configuration change is detected, retry the pending report without re-downloading.Backward Compatibility
Fully backward compatible:
report_statusand process it normally/tmp(cleared on reboot) — no persistent state changesupdate_configurationflow takes precedenceRelated Issues
Related #172 — The agent may be stopped and started while the configuration is applied