fix negative frankenphp_ready_workers metrics#1491
Conversation
a827428 to
7bfd91a
Compare
|
|
||
| // Check metrics | ||
| expectedMetrics := ` | ||
| # HELP frankenphp_ready_workers Running workers that have successfully called frankenphp_handle_request at least once |
There was a problem hiding this comment.
@AlliBalliBaba frankenphp_ready_workers already has value after initialization before handling a request. Is this expected, or does the HELP need to be updated?
There was a problem hiding this comment.
I think workers are only considered ready once they first reach the point of waiting for a request, so this line:
There was a problem hiding this comment.
I'm not sure though what exactly the original intention behind this metric is. If a worker crashes due to a fatal error or just restarts, should the metric decrease until the worker reaches frankenphp_handle_request again?
Or should this metric return the actual number of workers currently waiting? Maybe @withinboredom remembers.
There was a problem hiding this comment.
Maybe it would even be enough to move the ready check from line 165 to here: https://github.com/dunglas/frankenphp/blob/96dd739064f3fffd7b25010f918f6712681b026f/threadworker.go#L154
There was a problem hiding this comment.
Maybe @withinboredom remembers.
Originally, workers were only "booted" (this has a different meaning now) exactly once. Even if the thread died, it was restarted in-place. Thus this metric was initially intended to only be incremented once, and then never incremented again. The original purpose was being to determine whether the server was able to handle traffic (all threads are ready) and not meant to be monitored while running.
Currently, it still serves its original purpose, allowing you to determine if it is safe to send traffic to frankenphp when frankenphp is added to a load balancer or similar (k8s readiness probes, for example). However, after traffic is sent to frankenphp, the behavior becomes quite weird now (thread autoscaling, restarting, etc).
So, the ultimate question here is: what does this metric mean once the server has started and serving traffic? If we merely want to know how many threads are booting, we should probably change the metric from frankenphp_ready_workers to frankenphp_booting_workers as that is easier to calculate and more meaningful.
There was a problem hiding this comment.
i agree with @withinboredom to rename frankenphp_ready_workers metrics to frankenphp_booting_workers if that was the case.
but i'm not sure if it is possible that the worker failed to reboot after a crash or restart
There was a problem hiding this comment.
It's theoretically possible for workers to fail during startup if there's some kind of IO involved (not sure though if anyone does this).
If it's only about readiness, it might also make sense to just have an is_ready metric that flips to true after Init()
|
Should we merge this one then @withinboredom @AlliBalliBaba? |
|
I think it's fine to merge, it at least fixes the metric going negative for now |
|
Thanks! |
in Symfony runtime for FrankenPHP, the worker will be restarted after a certain number of requests.
frankenphp_worker_restartsmetric value will be increased, andfrankenphp_ready_workerswill be decreased, and the value never comes back to the initial state, even though requests can still be served.failed tests (https://github.com/dunglas/frankenphp/actions/runs/14401821475/job/40389102007)
closes #1372
maybe also close this issue #1451, but I didn't have any information for it