Skip to content

Commit 9eacd79

Browse files
committed
fix(tll): tighten shouldUseTLL regex to reduce false-positives (review #9)
The original gate was a single alternation regex that fired on any single occurrence of `first|last|before|after|then|later|track|...`. That over-fired on plain factual queries that incidentally contained one of those tokens — `what is my first name`, `the model used before GPT-4`, `track my spending` — pulling in unrelated TLL chain memories on the augmented retrieval path. Replaced the gate with a two-tier check: 1. ORDERING_TERMS_RE — a curated set of single-token signals (first/last/before/after/then/later/earlier/previous/next/prior). Only fires TLL when TWO co-occur, e.g. "what aspects did I discuss BEFORE and AFTER X". 2. SEQUENCE_PATTERNS — phrase-level structural signals (`in (chronological/reverse/the) order`, `when did`, `since when`, `over time`, `evolution of`, `history|timeline of`, `originally`/`initially`, `progression of`, `how X evolved/shifted/changed`, `brought up`). Single phrase hit is enough. Removed `track`, `sequence`, and bare `order` from the gate — they were the largest false-positive contributors. Updated `src/services/__tests__/tll-retrieval.test.ts`: - Positive list rewritten to canonical EO/MSR/TR shapes that hit one of the structural patterns or co-occurring ordering terms. - Negative list now includes the false-positive shapes the loose regex used to match (the three reviewer-cited ones plus a handful of single-ordering-term factual queries). 41/41 unit tests pass against the updated gate.
1 parent 623ac96 commit 9eacd79

2 files changed

Lines changed: 81 additions & 18 deletions

File tree

src/services/__tests__/tll-retrieval.test.ts

Lines changed: 32 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -52,44 +52,61 @@ function makeTllRepo(chainResult: string[]): {
5252
}
5353

5454
describe('shouldUseTLL', () => {
55+
// Canonical EO/MSR/TR question shapes — each fires either via two
56+
// ordering terms or via a single structural sequence phrase. The
57+
// pre-tightened regex over-fired on single-token ordering hits like
58+
// "what is my first name" / "the model used before GPT-4"; the
59+
// updated gate trades a few rare single-word matches for sharper
60+
// precision on these canonical shapes.
5561
const positiveQueries = [
56-
'in what order did the events happen',
57-
'what came first in the sequence',
58-
'what was the last meeting about',
59-
'what happened before the merger',
60-
'what changed after the launch',
62+
// SEQUENCE_PATTERNS hits
63+
'in what order did I bring up X',
64+
'in chronological order list the events',
6165
'when did the user move to Berlin',
66+
'since when has X been deprecated',
67+
'how preferences shifted over time',
6268
'show the evolution of the project',
63-
'list events in chronological order',
64-
'reconstruct the sequence',
65-
'build me a timeline of changes',
6669
'what is the history of this codebase',
67-
'how preferences shifted over time',
70+
'build me a timeline of changes',
71+
'when the topic was brought up',
6872
'what did the user originally say',
6973
'what did they initially mention',
70-
'first this then that',
71-
'and later they switched',
72-
'when the topic was brought up',
73-
'track the progression of opinion',
7474
'show progression of editor choice',
75+
'how did the architecture evolve',
76+
'how have my preferences shifted',
77+
// Two ordering terms (co-occurrence)
78+
'first this then that',
79+
'what aspects did I discuss before vs after the launch',
80+
'first the migration, then the rollback',
81+
'what came earlier and what came later',
7582
];
7683

7784
it.each(positiveQueries)('returns true for ordering query: %s', (q) => {
7885
expect(shouldUseTLL(q)).toBe(true);
7986
});
8087

8188
it('matches case-insensitively', () => {
82-
expect(shouldUseTLL('What Is The HISTORY?')).toBe(true);
83-
expect(shouldUseTLL('TIMELINE please')).toBe(true);
89+
expect(shouldUseTLL('What Is The HISTORY OF this?')).toBe(true);
90+
expect(shouldUseTLL('TIMELINE OF events please')).toBe(true);
8491
});
8592

8693
const negativeQueries = [
94+
// Non-temporal shapes (existing coverage)
8795
'what is X',
8896
'list all the entities',
8997
'explain why this is a tool',
9098
'who is the current owner',
9199
'tell me about the project',
92100
'summarize the discussion',
101+
// False-positive shapes that the prior loose regex incorrectly
102+
// matched (review #9). These must stay false under the new gate.
103+
'what is my first name',
104+
'the model used before GPT-4',
105+
'track my spending',
106+
'we then moved on to lunch',
107+
'what is the next step',
108+
'is this the previous version',
109+
'what came last in the queue',
93110
];
94111

95112
it.each(negativeQueries)('returns false for non-temporal query: %s', (q) => {

src/services/tll-retrieval.ts

Lines changed: 49 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,11 +29,57 @@ import type { TllRepository } from '../db/repository-tll.js';
2929
*/
3030
export const TLL_ENTITY_LOOKUP_SEED_LIMIT = 10;
3131

32-
const ORDERING_QUERY_RE =
33-
/\b(order|first|last|before|after|when did|evolution|chronological|sequence|timeline|history|over time|originally|initially|then|later|brought up|track|progression|how did .* evolve|in what order)\b/i;
32+
/**
33+
* Single-token ordering signals. Matched in isolation these are too
34+
* weak to gate TLL — "what is my FIRST name", "the model used BEFORE
35+
* GPT-4", "we then moved on" all contain one of these but are not
36+
* EO/MSR/TR queries. We require either two of them to co-occur, or
37+
* one of the structural sequence patterns below, before firing.
38+
*/
39+
const ORDERING_TERMS_RE =
40+
/\b(first|last|before|after|then|later|earlier|previous|next|prior)\b/gi;
3441

42+
/**
43+
* Structural sequence patterns. Each one is a phrase whose presence
44+
* unambiguously indicates an ordering / temporal-reasoning question.
45+
* Single-pattern hit is enough to gate TLL.
46+
*
47+
* Curated to keep precision high: "track my spending" and "what is my
48+
* first name" must not match any pattern here. Add new patterns
49+
* conservatively — a leak here will silently re-introduce the
50+
* false-positive class this fix addresses.
51+
*/
52+
const SEQUENCE_PATTERNS: readonly RegExp[] = [
53+
/\bin (what |the )?(chronological |reverse )?order\b/i,
54+
/\b(when|after) did\b/i,
55+
/\bsince when\b/i,
56+
/\bover time\b/i,
57+
/\bevolution of\b/i,
58+
/\b(history|timeline) of\b/i,
59+
/\bbrought up\b/i,
60+
/\b(originally|initially)\b/i,
61+
/\bprogression of\b/i,
62+
/\bhow .{1,80}(evolved?|shifted?|changed)\b/i,
63+
/\bwhat .{1,80}(originally|initially)\b/i,
64+
];
65+
66+
/**
67+
* Returns true if the query has the shape of an event-ordering / temporal
68+
* question and should trigger TLL chain expansion. The gate is
69+
* intentionally conservative: TLL augmentation is augmentation, not the
70+
* primary retrieval path, so over-firing was producing irrelevant chain
71+
* memories on plain-fact queries that happened to contain "first",
72+
* "before", "track", etc.
73+
*
74+
* Two ordering terms co-occurring (e.g. "what did I discuss BEFORE and
75+
* AFTER X") is a strong-enough signal on its own; one structural
76+
* sequence phrase (e.g. "in what order", "evolution of", "since when")
77+
* is also strong enough. Single ordering term + nothing else is not.
78+
*/
3579
export function shouldUseTLL(query: string): boolean {
36-
return ORDERING_QUERY_RE.test(query);
80+
const orderingMatches = (query.match(ORDERING_TERMS_RE) ?? []).length;
81+
if (orderingMatches >= 2) return true;
82+
return SEQUENCE_PATTERNS.some((re) => re.test(query));
3783
}
3884

3985
/**

0 commit comments

Comments
 (0)