Skip to content

feat: Add now() to troubleshoot pipeline latencies#19386

Open
abhishekrb19 wants to merge 8 commits intoapache:masterfrom
abhishekrb19:now_func
Open

feat: Add now() to troubleshoot pipeline latencies#19386
abhishekrb19 wants to merge 8 commits intoapache:masterfrom
abhishekrb19:now_func

Conversation

@abhishekrb19
Copy link
Copy Markdown
Contributor

@abhishekrb19 abhishekrb19 commented Apr 28, 2026

Fixes #18208

Adds a now() expression macro to Druid's native expression. It returns the current system timestamp in milliseconds since epoch, evaluated per row at processing time. Intended primarily for ingestion-time troubleshooting - e.g.,
capturing now() - __time as an ingestion_lag_ms dimension to measure end-to-end pipeline delay. This expression is only allowed for non-__time columns.

Release note

Added now() expression function that returns the current system timestamp in milliseconds since epoch. Useful at ingestion time for troubleshooting pipeline delays (e.g., now() - __time). Note: now() is non-deterministic as it evaluates for every row, so it can break idempotency. This can only be added to columns that are not __time.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

Comment thread docs/querying/math-expr.md Outdated
@aho135
Copy link
Copy Markdown
Contributor

aho135 commented Apr 28, 2026

Fixes #18208

Co-authored-by: aho135 <andrewho135@gmail.com>
@aho135
Copy link
Copy Markdown
Contributor

aho135 commented Apr 28, 2026

@abhishekrb19 Curious if you have any thoughts on this old issue: #6513

Seems like the reservations were mainly for using now() for __time which you already mentioned should not be used. Maybe worth adding some validation/enforcement for this

`now()` is non-deterministic: replicated streaming tasks and task replays evaluate it to different
values, producing inconsistent results across replicas. Do not use `now()` to overwrite `__time`.
For Kafka, prefer [`kafka.timestamp`](../ingestion/data-formats.md#kafka) as the
`__time` source.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it makes sense to mention Kafka here. The timestamps aren't equivalent as one is the time the event lands on Kafka and the other is the time the event is ingested in Druid

Comment thread docs/querying/math-expr.md Outdated
|----|-----------|
|now|now() returns the current system timestamp in milliseconds since epoch (1970-01-01 00:00:00 UTC). This function is evaluated for each row at processing time. It's recommended to use this only for troubleshooting issues - see [Using now() in ingestion](#using-now-in-ingestion).|
|timestamp|timestamp(expr[,format-string]) parses string expr into date then returns milliseconds from java epoch. without 'format-string' it's regarded as ISO datetime format |
|unix_timestamp|same with 'timestamp' function but returns seconds instead |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|unix_timestamp|same with 'timestamp' function but returns seconds instead |
|unix_timestamp|unix_timestamp(expr[,format-string]) same with 'timestamp' function but returns seconds instead |

Just for consistency with other functions

Copy link
Copy Markdown
Member

@FrankChen021 FrankChen021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity Findings
P0 0
P1 0
P2 1
P3 0
Total 1

This is an automated review by Codex GPT-5

{
// Return analysis indicating this is NOT constant
// by pretending we have a free variable
return new BindingAnalysis();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] Prevent now() from being planned as constant

Returning an empty BindingAnalysis makes ExpressionPlanner mark now() and expressions that only depend on now() as CONSTANT. Query-time expression selectors then build a ConstantExprEvalSelector and evaluate now() once when the selector is created, and expression filters/virtual columns also get stable cache keys, so native queries can reuse stale timestamps instead of evaluating at processing time. Add explicit non-deterministic handling, or otherwise keep now() out of constant selector/index/cache optimizations.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good catch, this is fixed in the latest commit.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this fixes the ExpressionPlan constant trait, but I think the selector path can still fold now() once. NowExpression.analyzeInputs() now reports only the synthetic __dummy__ binding; for normal segments that column has no capabilities and its selector is nil, so ExpressionSelectors.createBindings drops it and returns InputBindings.nilBindings(). makeExprEvalSelector then treats those nil bindings as a constant expression and returns new ConstantExprEvalSelector(plan.getExpression().eval(bindings)), so now() is still evaluated when the selector is created instead of at row/query processing time. The new tests cover the planner only; could you add selector-level coverage and avoid using a missing dummy input as the non-determinism marker?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, updated the code with your non-determinism suggestion

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the selector constant-folding path I called out is fixed now: now() carries a non-deterministic analysis flag and ExpressionSelectors skips ConstantExprEvalSelector for it. I think two related constant/cache paths still need the same treatment, though. Expr.asBitmapColumnIndex still treats any expression with no required bindings as a constant bitmap index, so an expression filter around now() can still be evaluated once while building the bitmap. Also, ExpressionDimFilter and ExpressionVirtualColumn cache keys still serialize the stable expression cache key (now() stringifies the same way every time), so cached query results can be reused across wall-clock changes. Could you either make non-deterministic expressions non-cacheable for these query components, or include a non-stable cache key / disable the affected optimizations when analysis.isNonDeterministic() is true?

@abhishekrb19
Copy link
Copy Markdown
Contributor Author

@abhishekrb19 Curious if you have any thoughts on this old issue: #6513
Seems like the reservations were mainly for using now() for __time which you already mentioned should not be used. Maybe worth adding some validation/enforcement for this

@aho135 @FrankChen021 - thanks for the reviews and for pointing me to the previous issues. I’ve added validation to restrict now() usage for non-__time columns and to throw an error when it’s used with the __time column.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add possibility of having current date-time in Transformations (Druid ingestion)

3 participants