Skip to content

Commit 34e35fa

Browse files
MichaelUrayCodex
authored andcommitted
[fix] Retry pending report_status instead of forcing re-download #251
When report_status fails after a configuration apply (e.g. due to a transient HTTP 502 from the controller), the agent previously deleted the local checksum files to trigger a full re-download on the next cycle. That worked but wasted bandwidth and caused the controller to stay stuck on "modified" until the next real config change. Instead persist the unreported status in a pending_report marker file and retry just report_status on the next polling cycle. Key details: - Define retry_pending_report as a shell function so "local" is valid on BusyBox ash, where "local" outside a function is a parse error that silently crashes the script mid-loop. - Write the PENDING_REPORT marker atomically via a tmp file + rename so a partial write (e.g. full disk) cannot leave a truncated marker behind. - Persist the marker BEFORE calling retry_with_backoff report_status so a crash or reboot between the retry attempt and the PENDING_REPORT write does not lose the status. - Fall back to invalidating the persistent checksum when the pending marker cannot be saved, preserving the previous force re-download behaviour as a safety net. - Validate the marker contents against the known-good status set (applied/error) and discard a malformed marker rather than retrying forever. - Only retry a pending report after configuration_changed() returns 0 (confirmed no-change). Non-0/non-1 exit codes indicate checksum fetch failures and the cached applied/error status may no longer reflect the controller's current desired configuration. - Only remove the marker after a successful report_status. Closes #251
1 parent 1f8beaa commit 34e35fa

2 files changed

Lines changed: 77 additions & 7 deletions

File tree

CHANGELOG.rst

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,12 @@ Change log
44
1.3.0 [unreleased]
55
------------------
66

7-
WIP.
7+
Bugfixes
8+
~~~~~~~~
9+
10+
- Fixed retrying pending ``report_status`` instead of forcing a
11+
re-download after a transient HTTP error `#251
12+
<https://github.com/openwisp/openwisp-config/pull/251>`_.
813

914
1.2.1 [2026-04-09]
1015
------------------

openwisp-config/files/openwisp.agent

Lines changed: 71 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -200,6 +200,7 @@ REGISTRATION_PARAMETERS="$WORKING_DIR/registration_parameters"
200200
TEST_CHECKSUM="$WORKING_DIR/test_checksum"
201201
UPDATE_INFO="$WORKING_DIR/update_info"
202202
STATUS_REPORT="$WORKING_DIR/status_report"
203+
PENDING_REPORT="$WORKING_DIR/pending_report"
203204
APPLYING_CONF="$WORKING_DIR/applying_conf"
204205
UPDATE_CONFIG_LOG="$WORKING_DIR/update-config.log"
205206
BOOTUP="$WORKING_DIR/bootup"
@@ -825,6 +826,7 @@ update_configuration() {
825826
result=$?
826827
fi
827828

829+
local status_to_report pending_marker_saved=0
828830
if [ "$result" -eq "0" ]; then
829831
logger "Configuration applied successfully" \
830832
-t openwisp \
@@ -833,14 +835,41 @@ update_configuration() {
833835
env -i ACTION="config-applied" /sbin/hotplug-call openwisp
834836
# store the new checksum as last known checksum
835837
cp "$CONFIGURATION_CHECKSUM" "$PERSISTENT_CHECKSUM"
836-
retry_with_backoff report_status "applied"
838+
status_to_report="applied"
837839
else
838-
retry_with_backoff report_status "error"
840+
status_to_report="error"
839841
fi
840-
# if reporting of the status fails, let it retry in the next cycle
842+
843+
# Persist the pending status BEFORE calling report_status so that a
844+
# crash or reboot between the retry attempt and success does not lose
845+
# the status. The marker is removed only after report_status succeeds.
846+
# Write atomically via a tmp file + rename so a partial write (e.g.
847+
# full disk) cannot leave a truncated marker behind.
848+
if printf '%s\n' "$status_to_report" >"${PENDING_REPORT}.tmp" \
849+
&& mv "${PENDING_REPORT}.tmp" "$PENDING_REPORT"; then
850+
pending_marker_saved=1
851+
else
852+
rm -f "${PENDING_REPORT}.tmp"
853+
logger -s "Failed to persist pending report marker, " \
854+
"falling back to checksum invalidation on failure" \
855+
-t openwisp \
856+
-p daemon.warning
857+
fi
858+
859+
retry_with_backoff report_status "$status_to_report"
841860
# shellcheck disable=SC2181
842-
if [ "$?" -ne "0" ]; then
843-
rm -f $CONFIGURATION_CHECKSUM $PERSISTENT_CHECKSUM
861+
if [ "$?" -eq "0" ]; then
862+
rm -f "$PENDING_REPORT"
863+
else
864+
logger -s "report_status failed, will retry in the next cycle" \
865+
-t openwisp \
866+
-p daemon.warning
867+
# If the marker could not be saved, invalidate the checksum so
868+
# the next cycle forces a re-download. This preserves the
869+
# previous behaviour whenever the retry marker is unavailable.
870+
if [ "$pending_marker_saved" -eq "0" ]; then
871+
rm -f "$PERSISTENT_CHECKSUM"
872+
fi
844873
fi
845874

846875
rm $APPLYING_CONF
@@ -977,16 +1006,52 @@ else
9771006
env -i ACTION="restart" /sbin/hotplug-call openwisp
9781007
fi
9791008

1009+
retry_pending_report() {
1010+
local pending_status
1011+
if [ ! -f "$PENDING_REPORT" ]; then
1012+
return 0
1013+
fi
1014+
# Trim whitespace to tolerate an occasional trailing newline.
1015+
pending_status=$(tr -d '[:space:]' <"$PENDING_REPORT")
1016+
# Validate against the same known-good set accepted by report_status.
1017+
# update_configuration() only ever writes "applied" or "error" today.
1018+
case "$pending_status" in
1019+
applied | error) ;;
1020+
*)
1021+
logger -s "Pending report marker has invalid status '$pending_status', discarding" \
1022+
-t openwisp \
1023+
-p daemon.err
1024+
rm -f "$PENDING_REPORT"
1025+
return 1
1026+
;;
1027+
esac
1028+
logger "Retrying pending report_status: $pending_status" \
1029+
-t openwisp \
1030+
-p daemon.info
1031+
retry_with_backoff report_status "$pending_status"
1032+
# shellcheck disable=SC2181
1033+
if [ "$?" -eq "0" ]; then
1034+
rm -f "$PENDING_REPORT"
1035+
fi
1036+
}
1037+
9801038
while true; do
9811039
# check management interface at each iteration
9821040
# (because the address may change due to
9831041
# configuration updates or manual reconfigurations)
9841042
discover_management_ip
9851043

9861044
configuration_changed
1045+
config_change_status=$?
9871046

988-
if [ "$?" -eq "1" ]; then
1047+
if [ "$config_change_status" -eq "1" ]; then
9891048
update_configuration
1049+
elif [ "$config_change_status" -eq "0" ] && [ -f "$PENDING_REPORT" ]; then
1050+
# Retry a previously failed report_status without re-downloading.
1051+
# Only do this after a confirmed no-change result (exit 0); other
1052+
# exit codes indicate checksum fetch failures and the pending
1053+
# applied/error status could no longer be accurate.
1054+
retry_pending_report
9901055
fi
9911056
env -i ACTION="end-of-cycle" /sbin/hotplug-call openwisp
9921057

0 commit comments

Comments
 (0)