Skip to content

fix negative frankenphp_ready_workers metrics#1491

Merged
dunglas merged 2 commits intophp:mainfrom
IndraGunawan:fix-negative-ready-workers-metrics
May 1, 2025
Merged

fix negative frankenphp_ready_workers metrics#1491
dunglas merged 2 commits intophp:mainfrom
IndraGunawan:fix-negative-ready-workers-metrics

Conversation

@IndraGunawan
Copy link
Copy Markdown
Contributor

@IndraGunawan IndraGunawan commented Apr 11, 2025

in Symfony runtime for FrankenPHP, the worker will be restarted after a certain number of requests. frankenphp_worker_restarts metric value will be increased, and frankenphp_ready_workers will be decreased, and the value never comes back to the initial state, even though requests can still be served.

failed tests (https://github.com/dunglas/frankenphp/actions/runs/14401821475/job/40389102007)

closes #1372
maybe also close this issue #1451, but I didn't have any information for it

@IndraGunawan IndraGunawan force-pushed the fix-negative-ready-workers-metrics branch from a827428 to 7bfd91a Compare April 11, 2025 11:26
Comment thread caddy/caddy_test.go

// Check metrics
expectedMetrics := `
# HELP frankenphp_ready_workers Running workers that have successfully called frankenphp_handle_request at least once
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlliBalliBaba frankenphp_ready_workers already has value after initialization before handling a request. Is this expected, or does the HELP need to be updated?

Copy link
Copy Markdown
Contributor

@AlliBalliBaba AlliBalliBaba Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think workers are only considered ready once they first reach the point of waiting for a request, so this line:

https://github.com/dunglas/frankenphp/blob/96dd739064f3fffd7b25010f918f6712681b026f/threadworker.go#L165

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure though what exactly the original intention behind this metric is. If a worker crashes due to a fatal error or just restarts, should the metric decrease until the worker reaches frankenphp_handle_request again?
Or should this metric return the actual number of workers currently waiting? Maybe @withinboredom remembers.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would even be enough to move the ready check from line 165 to here: https://github.com/dunglas/frankenphp/blob/96dd739064f3fffd7b25010f918f6712681b026f/threadworker.go#L154

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @withinboredom remembers.

Originally, workers were only "booted" (this has a different meaning now) exactly once. Even if the thread died, it was restarted in-place. Thus this metric was initially intended to only be incremented once, and then never incremented again. The original purpose was being to determine whether the server was able to handle traffic (all threads are ready) and not meant to be monitored while running.

Currently, it still serves its original purpose, allowing you to determine if it is safe to send traffic to frankenphp when frankenphp is added to a load balancer or similar (k8s readiness probes, for example). However, after traffic is sent to frankenphp, the behavior becomes quite weird now (thread autoscaling, restarting, etc).

So, the ultimate question here is: what does this metric mean once the server has started and serving traffic? If we merely want to know how many threads are booting, we should probably change the metric from frankenphp_ready_workers to frankenphp_booting_workers as that is easier to calculate and more meaningful.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree with @withinboredom to rename frankenphp_ready_workers metrics to frankenphp_booting_workers if that was the case.

but i'm not sure if it is possible that the worker failed to reboot after a crash or restart

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's theoretically possible for workers to fail during startup if there's some kind of IO involved (not sure though if anyone does this).

If it's only about readiness, it might also make sense to just have an is_ready metric that flips to true after Init()

@dunglas
Copy link
Copy Markdown
Member

dunglas commented Apr 29, 2025

Should we merge this one then @withinboredom @AlliBalliBaba?

@AlliBalliBaba
Copy link
Copy Markdown
Contributor

I think it's fine to merge, it at least fixes the metric going negative for now

@dunglas dunglas merged commit a6e1d35 into php:main May 1, 2025
42 of 43 checks passed
@dunglas
Copy link
Copy Markdown
Member

dunglas commented May 1, 2025

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants