Skip to content

Commit 85f216c

Browse files
Flexible Categorization Engine
* Implemented flexible categorization * Doc updates * Better test data generation * Added article to explain flexible categorization * Formatting fixes * Added more tests * 📝 Add docstrings to `feature/GK/flexible-categorization` Docstrings generation was requested by @kyurkchyan. The following files were modified: * `src/ServiceBusToolset.Application/DeadLetters/Common/CategorizationSchema.cs` * `src/ServiceBusToolset.Application/DeadLetters/Common/CategoryMerger.cs` * `src/ServiceBusToolset.Application/DeadLetters/Common/CategoryPropertyRef.cs` * `src/ServiceBusToolset.Application/DeadLetters/Common/CategoryPropertyResolver.cs` * `src/ServiceBusToolset.Application/DeadLetters/Common/CategorySelection.cs` * `src/ServiceBusToolset.Application/DeadLetters/Common/DlqCategory.cs` * `src/ServiceBusToolset.Application/DeadLetters/Common/DlqCategoryDisplay.cs` * `src/ServiceBusToolset.Application/DeadLetters/Common/DlqCategoryKey.cs` * `src/ServiceBusToolset.Application/DeadLetters/Common/DlqCategoryScanner.cs` * `src/ServiceBusToolset.Application/DeadLetters/Common/DlqMessageService.cs` * `src/ServiceBusToolset.Application/DeadLetters/Common/DlqScanSession.cs` * `src/ServiceBusToolset.Application/DeadLetters/Common/StreamDlq.cs` * `src/ServiceBusToolset.Application/DeadLetters/DiagnoseDlq/DiagnoseDlqCommandHandler.cs` * `src/ServiceBusToolset.Application/DeadLetters/DumpDlq/DumpDlqMessagesCommandHandler.cs` * `src/ServiceBusToolset.Application/DeadLetters/PurgeDlq/PurgeDlqMessagesCommandHandler.cs` * `src/ServiceBusToolset.Application/DeadLetters/ResubmitDlq/ResubmitDlqMessagesCommandHandler.cs` * `src/ServiceBusToolset.Application/DeadLetters/ResubmitDlq/StreamDlqCategories.cs` * `src/ServiceBusToolset.CLI/DeadLetters/Common/DlqScanSessionExtensions.cs` * `src/ServiceBusToolset.CLI/DeadLetters/DiagnoseDlq/DiagnoseDlqCommandHandler.cs` * `src/ServiceBusToolset.CLI/DeadLetters/DumpDlq/DumpDlqCommandHandler.cs` * `src/ServiceBusToolset.CLI/DeadLetters/PurgeDlq/PurgeDlqCommandHandler.cs` * `src/ServiceBusToolset.CLI/DeadLetters/ResubmitDlq/ResubmitDlqCommandHandler.cs` * `src/ServiceBusToolset.TestHarness/DeadLetters/GenerateDlq/DeadLetterMessageFactory.cs` * `src/ServiceBusToolset.TestHarness/DeadLetters/GenerateDlq/GenerateDlqCommandHandler.cs` * `tests/ServiceBusToolset.Application.Tests/DeadLetters/Common/CategorizationSchemaShould.cs` * `tests/ServiceBusToolset.Application.Tests/DeadLetters/Common/CategoryPropertyRefShould.cs` * `tests/ServiceBusToolset.Application.Tests/DeadLetters/Common/CategoryPropertyResolverShould.cs` * `tests/ServiceBusToolset.Application.Tests/DeadLetters/Common/DlqCategoryKeyShould.cs` * `tests/ServiceBusToolset.Application.Tests/DeadLetters/Common/DlqCategoryScannerShould.cs` * `tests/ServiceBusToolset.Application.Tests/DeadLetters/Common/DlqCategoryShould.cs` These files were kept as they were: * `tests/ServiceBusToolset.Application.Tests/DeadLetters/Common/CategoryMergerShould.cs` * `tests/ServiceBusToolset.Application.Tests/DeadLetters/Common/DlqCategoryDisplayShould.cs` * `tests/ServiceBusToolset.Application.Tests/DeadLetters/Common/DlqMessageServiceShould.cs` These file types are not supported: * `.claude/agents/test-writer.md` * `CLAUDE.md` * `README.md` * `docs/articles/flexible-categorization.md` * `docs/articles/port-improvements-to-all-commands.md` * `docs/articles/smart-category-merging.md` * `docs/diagnose-dlq.md` * `docs/dump-dlq.md` * `docs/purge-dlq.md` * `docs/resubmit-dlq.md` * PR comment fixes * Code cleanup --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
1 parent df2df7b commit 85f216c

53 files changed

Lines changed: 2750 additions & 252 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/agents/test-writer.md

Lines changed: 6 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -143,13 +143,9 @@ Always include `// Arrange`, `// Act`, `// Assert` comments:
143143
public async Task ReturnSuccess_WhenMessagesExist()
144144
{
145145
// Arrange
146-
var command = new DumpDlqMessagesCommand(
147-
"namespace.servicebus.windows.net",
148-
EntityTarget.ForQueue("my-queue"),
149-
"/output/messages.json",
150-
null,
151-
null,
152-
null);
146+
var command = new DumpDlqMessagesCommand("namespace.servicebus.windows.net",
147+
EntityTarget.ForQueue("my-queue"),
148+
"/output/messages.json");
153149

154150
var mockClient = Substitute.For<ServiceBusClient>();
155151
_clientFactory.CreateClient(Arg.Any<string>()).Returns(mockClient);
@@ -176,13 +172,9 @@ public async Task HandleAsync_ShouldPassCorrectNamespace_WhenCalled()
176172
_clientFactory.CreateClient(Arg.Do<string>(ns => capturedNamespace = ns))
177173
.Returns(Substitute.For<ServiceBusClient>());
178174

179-
var command = new DumpDlqMessagesCommand(
180-
"my-namespace.servicebus.windows.net",
181-
EntityTarget.ForQueue("test-queue"),
182-
"/output/test.json",
183-
null,
184-
null,
185-
null);
175+
var command = new DumpDlqMessagesCommand("my-namespace.servicebus.windows.net",
176+
EntityTarget.ForQueue("test-queue"),
177+
"/output/test.json");
186178

187179
// Act
188180
await _handler.Handle(command, TestContext.Current.CancellationToken);

CLAUDE.md

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,12 +71,30 @@ Cross-cutting concerns go in `Common/` folders at the appropriate level:
7171
- Add options type to `ParseArguments<...>()`
7272
- Add `MapResult` handler
7373

74+
### Categorization Engine
75+
76+
DLQ messages are categorized by configurable properties via `--categorize-by`. Default: `#Subject,#DeadLetterReason`.
77+
78+
- `#PropertyName` — system property on `ServiceBusReceivedMessage` (e.g., `#DeadLetterReason`, `#ContentType`).
79+
Unrecognized names fall through to `ApplicationProperties`.
80+
- `$PropertyName` — deserialized JSON body property with dot notation for nesting (e.g., `$ErrorCode`,
81+
`$Product.Category.Name`).
82+
- Unresolved properties resolve to `"(none)"`.
83+
84+
Key types in `DeadLetters/Common/`:
85+
86+
- `CategoryPropertyRef` — parsed `#`/`$` reference with `PropertySource` enum
87+
- `CategorizationSchema` — ordered list of property refs; `Default` = `#Subject,#DeadLetterReason`
88+
- `CategoryPropertyResolver` — resolves system/body properties with per-SequenceNumber body cache
89+
- `DlqCategoryKey` — N-dimensional key (`ImmutableArray<string>` + custom equality)
90+
- `DlqCategory` — N-dimensional category with `ToKey()`/`FromKey()` factories
91+
7492
### Key Services
7593

7694
**Application Layer:**
7795

7896
- `IServiceBusClientFactory` - Creates Service Bus clients
79-
- `DlqMessageService` - DLQ peek/filter operations
97+
- `DlqMessageService` - DLQ peek/filter operations (accepts optional `CategorizationSchema`)
8098

8199
**CLI Layer:**
82100

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,10 @@ sbtools dump-dlq -n mynamespace.servicebus.windows.net -q myqueue -o dlq-message
6767
# Interactive mode - select which message categories to dump
6868
sbtools dump-dlq -n mynamespace.servicebus.windows.net -q myqueue -o dlq-messages.json -i
6969

70+
# Categorize by custom properties (system #Prop, body $Prop)
71+
sbtools dump-dlq -n mynamespace.servicebus.windows.net -q myqueue -o dlq-messages.json -i \
72+
--categorize-by "#DeadLetterReason,$ErrorCode"
73+
7074
# Diagnose DLQ messages using Application Insights
7175
sbtools diagnose-dlq -n mynamespace.servicebus.windows.net -q myqueue \
7276
-a "/subscriptions/.../resourceGroups/.../providers/microsoft.insights/components/my-app-insights"
Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
# Flexible Categorization Engine
2+
3+
## The Problem
4+
5+
All four DLQ commands (`dump-dlq`, `purge-dlq`, `resubmit-dlq`, `diagnose-dlq`) group messages into categories for
6+
interactive selection. Previously, categorization was hardcoded to two properties: `Subject` (Label) and
7+
`DeadLetterReason`. This was baked into every layer — from `DlqCategoryKey(string Label, string DeadLetterReason)`
8+
through the scan session, display table, merge algorithm, and message filtering.
9+
10+
This worked for simple cases:
11+
12+
```text
13+
╭───┬──────────────────┬──────────────────────────┬───────╮
14+
│ # │ Label │ DeadLetterReason │ Count │
15+
├───┼──────────────────┼──────────────────────────┼───────┤
16+
│ 1 │ OrderProcessor │ MaxDeliveryCountExceeded │ 847 │
17+
│ 2 │ PaymentHandler │ TTLExpiredException │ 412 │
18+
╰───┴──────────────────┴──────────────────────────┴───────╯
19+
```
20+
21+
But real-world Service Bus usage often demands different grouping strategies:
22+
23+
- **Group by error code in the message body** — when `Subject` is generic but the JSON payload contains a structured
24+
`errorCode` field.
25+
- **Group by deployment context** — when messages carry `environment` or `region` metadata in their body that matters
26+
more than the dead-letter reason.
27+
- **Group by custom application headers** — when teams set application-specific properties like `TenantId` or
28+
`ProcessorVersion` on messages.
29+
- **Single-dimension grouping** — sometimes only `DeadLetterReason` matters, and `Subject` just adds noise.
30+
31+
With hardcoded categorization, none of this was possible without modifying source code.
32+
33+
## The Solution: `--categorize-by`
34+
35+
The `--categorize-by` option lets users define which properties form the category dimensions, using a prefix syntax:
36+
37+
| Prefix | Source | Example |
38+
|-----------------|------------------------------------------------|-------------------------------------------------|
39+
| `#PropertyName` | System property on `ServiceBusReceivedMessage` | `#Subject`, `#DeadLetterReason`, `#ContentType` |
40+
| `$PropertyName` | JSON body property (deserialized) | `$errorCode`, `$tier` |
41+
| `$Nested.Path` | Nested JSON body property via dot notation | `$error.severity`, `$context.region` |
42+
43+
```bash
44+
# Default (backward-compatible, same as before)
45+
dump-dlq -n ns.servicebus.windows.net -q myqueue -i
46+
47+
# Single dimension: just the dead-letter reason
48+
dump-dlq -n ns.servicebus.windows.net -q myqueue -i --categorize-by "#DeadLetterReason"
49+
50+
# Mixed: system property + JSON body property
51+
dump-dlq -n ns.servicebus.windows.net -q myqueue -i --categorize-by "#DeadLetterReason,$errorCode"
52+
53+
# Nested body property
54+
dump-dlq -n ns.servicebus.windows.net -q myqueue -i --categorize-by "#Subject,$error.severity"
55+
56+
# Three dimensions
57+
dump-dlq -n ns.servicebus.windows.net -q myqueue -i --categorize-by "$tier,#Subject,#DeadLetterReason"
58+
```
59+
60+
The table headers update dynamically:
61+
62+
```text
63+
╭───┬──────────────────────────┬─────────────────┬───────╮
64+
│ # │ #DeadLetterReason │ $error.severity │ Count │
65+
├───┼──────────────────────────┼─────────────────┼───────┤
66+
│ 1 │ MaxDeliveryCountExceeded │ critical │ 312 │
67+
│ 2 │ MaxDeliveryCountExceeded │ warning │ 198 │
68+
│ 3 │ TTLExpiredException │ info │ 47 │
69+
╰───┴──────────────────────────┴─────────────────┴───────╯
70+
```
71+
72+
## Architecture
73+
74+
### Core Types
75+
76+
The engine is built on three new types in `Application/DeadLetters/Common/`:
77+
78+
**`CategoryPropertyRef`** — A parsed reference to a single property. Knows whether it's a system or body property and
79+
carries the dot-separated path.
80+
81+
```csharp
82+
public enum PropertySource { System, Body }
83+
84+
public sealed record CategoryPropertyRef(PropertySource Source, string PropertyPath)
85+
{
86+
public string DisplayName => Source == PropertySource.System
87+
? $"#{PropertyPath}" : $"${PropertyPath}";
88+
89+
public static CategoryPropertyRef Parse(string reference);
90+
// "#Subject" → (System, "Subject")
91+
// "$error.code" → (Body, "error.code")
92+
}
93+
```
94+
95+
**`CategorizationSchema`** — An ordered list of property references that defines the categorization dimensions. Provides
96+
a static `Default` that preserves backward compatibility.
97+
98+
```csharp
99+
public sealed class CategorizationSchema
100+
{
101+
public static readonly CategorizationSchema Default = new([
102+
new(PropertySource.System, "Subject"),
103+
new(PropertySource.System, "DeadLetterReason")
104+
]);
105+
106+
public IReadOnlyList<CategoryPropertyRef> Properties { get; }
107+
public int DimensionCount => Properties.Count;
108+
public bool UsesBodyProperties { get; } // cached flag for optimization
109+
110+
public static CategorizationSchema Parse(IEnumerable<string>? references);
111+
// null/empty → Default
112+
}
113+
```
114+
115+
**`CategoryPropertyResolver`** — Resolves a `CategoryPropertyRef` against a `ServiceBusReceivedMessage` to produce a
116+
string value. Handles system property dispatch, JSON body deserialization with caching, and dot-path navigation.
117+
118+
### From 2D to N-dimensional
119+
120+
The key structural change was evolving `DlqCategoryKey` from a two-field record to an N-dimensional key:
121+
122+
```text
123+
Before: sealed record DlqCategoryKey(string Label, string DeadLetterReason)
124+
After: sealed class DlqCategoryKey(ImmutableArray<string> Values) + IEquatable
125+
```
126+
127+
The same transformation applied to `DlqCategory`. Both types retain backward-compatible `Label` and `DeadLetterReason`
128+
convenience properties that index into `Values[0]` and `Values[1]`, so existing code that only uses the default schema
129+
continues to work unchanged.
130+
131+
Custom `IEquatable<DlqCategoryKey>` and `GetHashCode()` implementations were necessary because `ImmutableArray<T>` does
132+
not provide structural equality — the default record equality would compare by reference, breaking dictionary lookups
133+
and grouping.
134+
135+
### Property Resolution
136+
137+
System properties resolve via a switch expression over known `ServiceBusReceivedMessage` property names:
138+
139+
```text
140+
Subject, DeadLetterReason, ContentType, CorrelationId,
141+
MessageId, SessionId, ReplyTo, To, DeadLetterErrorDescription
142+
```
143+
144+
Unrecognized names fall through to `message.ApplicationProperties`, enabling categorization by custom headers without a
145+
special syntax. If nothing matches, the value is `"(none)"`.
146+
147+
Body properties use the existing `MessageBodyDecoder.Decode()` to get a `JsonNode`, then navigate the dot-separated path
148+
segment by segment. A `ConcurrentDictionary<long, JsonNode?>` keyed by `SequenceNumber` caches decoded bodies —
149+
important because the reactive scanning architecture rebuilds category snapshots every second from the same cached
150+
messages.
151+
152+
### Integration with Existing Features
153+
154+
**`--merge-similar`** — The LCS-based category merger was generalized from 2 hardcoded dimensions (label frame + reason
155+
frame) to N dimensions. The `TokenizedCategory` type changed from `(string[] LabelTokens, string[] ReasonTokens)` to
156+
`string[][] DimensionTokens`. Scoring computes per-dimension LCS scores and requires all to meet the 0.5 threshold. The
157+
core LCS, scoring, and template rendering algorithms are unchanged.
158+
159+
**Reactive scanning** — The `StreamDlq` command, `DlqScanSession`, and `DlqCategoryScanner` all accept optional
160+
`CategorizationSchema` and `CategoryPropertyResolver` parameters. When omitted, they fall back to the default schema.
161+
The resolver's body cache integrates naturally with the reactive architecture — bodies are decoded once and reused
162+
across snapshot rebuilds.
163+
164+
**Interactive display**`DlqCategoryDisplay.GenerateTableData()` generates column headers dynamically from
165+
`schema.Properties.Select(p => p.DisplayName)` instead of hardcoded `"Label"` / `"DeadLetterReason"` strings. The table
166+
adapts to any number of dimensions.
167+
168+
## Data Flow
169+
170+
```text
171+
CLI option Application layer Display
172+
──────────── ───────────────── ───────
173+
174+
--categorize-by CategorizationSchema.Parse()
175+
"#DeadLetterReason,$tier" → Schema { Properties: [ → Table headers:
176+
(System, "DeadLetterReason"), "#DeadLetterReason", "$tier"
177+
(Body, "tier")
178+
]}
179+
180+
181+
DlqCategoryKey.FromMessage()
182+
resolver.ResolveProperty(msg, prop)
183+
184+
185+
DlqCategoryKey(["MaxDelivery..", "1"])
186+
187+
188+
GroupBy key → DlqCategory(values, count)
189+
190+
191+
CategoryMerger.Merge() (if --merge-similar)
192+
193+
194+
Interactive selection → ExpandKeys → Filter
195+
```
196+
197+
## Design Decisions
198+
199+
**Sealed class over record for `DlqCategoryKey`** — Records generate equality based on field values, but
200+
`ImmutableArray<T>` has reference equality semantics. A sealed class with explicit `IEquatable` implementation gives
201+
correct structural equality for dictionary keys and LINQ grouping.
202+
203+
**ApplicationProperties fallback for `#` syntax** — Rather than requiring a separate prefix for custom headers,
204+
unrecognized `#PropertyName` values fall through to `message.ApplicationProperties`. This means `#Diagnostic-Id`
205+
resolves a custom header, while `#Subject` resolves the built-in property. One syntax covers both.
206+
207+
**Body cache keyed by SequenceNumber** — Each message in a Service Bus peek has a unique, stable sequence number. Using
208+
this as the cache key (rather than MessageId) avoids issues with duplicate message IDs and aligns with how the reactive
209+
cache identifies messages.
210+
211+
**`"(none)"` for unresolved values** — When a property doesn't exist on a message (wrong path, binary body, null value),
212+
the resolver returns `"(none)"` rather than throwing. This groups all unresolvable messages together in one category,
213+
which is the most useful behavior for interactive exploration.
214+
215+
**Default schema for backward compatibility** — When `--categorize-by` is not specified,
216+
`CategorizationSchema.Parse(null)` returns `CategorizationSchema.Default` (`#Subject,#DeadLetterReason`). Every code
217+
path that previously hardcoded these two properties now passes `schema ?? CategorizationSchema.Default`, producing
218+
identical behavior.
219+
220+
## Files
221+
222+
3 new files in `Application/DeadLetters/Common/` (`CategoryPropertyRef`, `CategorizationSchema`,
223+
`CategoryPropertyResolver`), 6 modified core types (`DlqCategoryKey`, `DlqCategory`, `DlqCategorySnapshot`,
224+
`DlqCategoryScanner`, `DlqCategoryDisplay`, `CategoryMerger`), 6 modified infrastructure files (`DlqScanSession`,
225+
`DlqMessageService`, `StreamDlq`, `CategorySelection`, `DlqScanSessionExtensions`, `StreamDlqCategories`), and 8 CLI
226+
files across all 4 commands (CLI option + handler parse call each).

docs/articles/port-improvements-to-all-commands.md

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ public class DlqScanSession : IDisposable
3030
{
3131
public ReactiveMessageCache<ServiceBusReceivedMessage, long> Cache { get; }
3232
public IObservable<DlqCategorySnapshot> CategoryStream { get; }
33+
public CategorizationSchema Schema { get; }
34+
public CategoryPropertyResolver Resolver { get; }
3335
public TaskCompletionSource ScanCompletion { get; }
3436
public long TotalDlqCount { get; set; }
3537
public Exception? Error { get; set; }
@@ -43,11 +45,11 @@ public class DlqScanSession : IDisposable
4345
}
4446
```
4547

46-
The `virtual MatchesFilter` is the extension point. `DlqResubmitSession` overrides it to exclude already-resubmitted messages via `ResubmitTracker`. The other three commands use the base implementation which accepts everything.
48+
The session carries a `CategorizationSchema` (default: `#Subject,#DeadLetterReason`) and a `CategoryPropertyResolver` for schema-aware categorization. The `virtual MatchesFilter` is the extension point. `DlqResubmitSession` overrides it to exclude already-resubmitted messages via `ResubmitTracker`. The other three commands use the base implementation which accepts everything.
4749

4850
**`DlqCategoryScanner`** — static class with the two core operations extracted from `StreamDlqCategoriesCommandHandler`:
4951

50-
- `BuildCategorySnapshot(cache, mergeSimilar)` — groups cache contents by Subject + DeadLetterReason, optionally runs `CategoryMerger.Merge`, returns a `DlqCategorySnapshot`.
52+
- `BuildCategorySnapshot(cache, mergeSimilar, schema?, resolver?)` — groups cache contents by the configured `CategorizationSchema` properties (default: Subject + DeadLetterReason), optionally runs `CategoryMerger.Merge`, returns a `DlqCategorySnapshot`.
5153
- `FeedCacheAsync(clientFactory, namespace, target, cache, session, messageFilter?, ct)` — background pagination loop. The key generalization: `messageFilter` is now an optional `Func<ServiceBusReceivedMessage, bool>?` instead of a hardcoded `ResubmitTracker` check. Resubmit passes `m => !tracker.WasResubmitted(m.MessageId)`; everyone else passes `null`.
5254

5355
**`StreamDlqCommand` / `StreamDlqCommandHandler`** — a new shared Mediator command that creates a plain `DlqScanSession`, starts the background feed via `Task.Run`, and returns the session immediately. The existing `StreamDlqCategoriesCommand` still exists for resubmit, creating a `DlqResubmitSession` with its tracker-aware filter.
@@ -59,7 +61,8 @@ public sealed record DlqCategorySnapshot(
5961
IReadOnlyList<DlqCategory> Categories,
6062
int TotalMessageCount,
6163
bool IsComplete,
62-
CategoryMergeResult? MergeResult = null);
64+
CategoryMergeResult? MergeResult = null,
65+
CategorizationSchema? Schema = null);
6366
```
6467

6568
### CLI Layer: Shared Interactive Flow
@@ -267,9 +270,12 @@ The `infra/test/` directory contains Bicep templates and PowerShell scripts for
267270
```
268271
Application Layer
269272
├── DeadLetters/Common/
273+
│ ├── CategoryPropertyRef.cs ← shared: #system / $body property reference
274+
│ ├── CategorizationSchema.cs ← shared: configurable categorization dimensions
275+
│ ├── CategoryPropertyResolver.cs ← shared: resolves properties from messages with body cache
270276
│ ├── DlqCategoryScanner.cs ← shared: BuildCategorySnapshot + FeedCacheAsync
271277
│ ├── DlqCategorySnapshot.cs ← shared: snapshot record
272-
│ ├── DlqScanSession.cs ← shared: base session (cache + stream + signals)
278+
│ ├── DlqScanSession.cs ← shared: base session (cache + stream + signals + schema)
273279
│ └── StreamDlq.cs ← shared: Mediator command for dump/purge/diagnose
274280
275281
├── DeadLetters/DumpDlq/

0 commit comments

Comments
 (0)