You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| Tokenizer Behavior | Azure OpenAI uses subword tokenization (e.g., `tiktoken`) | Complex or rare words may consume more tokens than expected |
42
42
| Verbose or Tangential Output| High temperature settings cause longer, less focused completions | May exceed token limits and truncate output mid-thought |
43
43
44
+
44
45
<details>
45
46
<summary><b> Structural Complexity </b> (Click to expand)</summary>
46
47
47
48
> Documents with **conditional logic**, **nested clauses**, or **sparse named entities** are structurally complex. These patterns confuse tokenizers because they lack clear semantic anchors (like names or dates) and often involve long, interdependent clauses.
48
49
49
-
> E.g: `If the system fails to initialize, and the fallback protocol is not triggered unless the override is active, then the watchdog timer must be reset manually.`
50
+
> E.g: `If the system fails to initialize, and the fallback protocol is not triggered unless the override is active, then the watchdog timer must be reset manually.`
50
51
> This sentence, while not long, contains multiple conditions and dependencies. Tokenizers break it into many subword units, inflating the token count.
51
52
52
-
> Why It Matter?
53
+
> Why It Matters?
53
54
54
55
- You may hit token limits even with seemingly short documents.
55
56
- Truncation may occur mid-sentence or mid-logic, leading to incomplete or incoherent outputs.
57
+
- Azure OpenAI’s tokenizer (`tiktoken`) breaks text into subword units, so structurally dense content can consume more tokens than expected.
58
+
- Complex documents often lack named entities (e.g., people, places, dates), which are helpful for grounding and compressing meaning efficiently.
56
59
57
60
> How to Address?
58
61
59
-
- Use **semantic chunking** to isolate logical units (e.g., one condition per chunk).
60
-
- Preprocess documents to simplify or flatten nested logic where possible.
62
+
- Use **semantic chunking** to isolate logical units (e.g., one condition per chunk). In Azure, this can be implemented using:
63
+
-**Azure AI Search’s Document Layout skill** to chunk by paragraphs, headings, or sections.
64
+
-**Text Split skill** to define chunk size and overlap, preserving context across boundaries.
- Preprocess documents to simplify or flatten nested logic where possible:
76
+
- Use Azure Functions or Logic Apps to transform complex conditionals into simpler declarative statements or bullet points.
77
+
- Example transformation:
78
+
- Original:
79
+
`If A and B, unless C, then D.`
80
+
- Flattened:
81
+
- Condition 1: A is true
82
+
- Condition 2: B is true
83
+
- Exception: C is false
84
+
- Action: Perform D
85
+
86
+
- Use **token-aware chunking** before sending content to Azure OpenAI:
87
+
- Deploy a preprocessing step using `tiktoken` in an Azure Function to:
88
+
- Count tokens per clause or paragraph
89
+
- Split content into ≤3000-token chunks
90
+
- Return token-safe chunks to Azure OpenAI for inference
91
+
- This ensures that each chunk respects token limits and avoids mid-logic truncation.
92
+
93
+
- Monitor token usage and truncation patterns using **Azure Monitor** and **Log Analytics**:
94
+
- Track metrics like `tokens_used`, `completion_tokens`, and `prompt_tokens`.
95
+
- Set alerts for high token usage or frequent truncation errors.
61
96
62
97
</details>
63
98
64
99
<details>
65
100
<summary><b> Tokenizer Behavior </b> (Click to expand)</summary>
66
101
67
-
> Azure OpenAI uses the same tokenizer as OpenAI, typically `tiktoken`. This tokenizer breaks text into **subword tokens**, not full words. For example:
102
+
> Azure OpenAI uses the same tokenizer as OpenAI, typically `tiktoken`. This tokenizer breaks text into **subword tokens**, not full words. For example:
> Complex syntax, rare words, or compound identifiers (like in codeor legal text) often result in more tokens per word than expected.
106
+
> Complex syntax, rare words, or compound identifiers (like in code, legal, or scientific text) often result in more tokens per word than expected. This is especially common in enterprise documents with domain-specific terminology, acronyms, or camelCase identifiers.
72
107
73
108
> **Why It Matters**
74
109
75
-
- Token count can balloon unexpectedly, even in short or medium-length documents.
76
-
- This can lead to premature truncation or rejection of prompts that exceed model limits.
110
+
- Token count can balloon unexpectedly, even in short or medium-length documents.
111
+
- This can lead to:
112
+
- Premature truncation of outputs.
113
+
- Rejection of prompts that exceed model limits (e.g., 128k for GPT-4-128k).
114
+
- Increased latency and cost due to inefficient token usage.
115
+
- Token inflation is especially problematic in Azure OpenAI when using models in high-throughput or stateless scenarios, where every token counts toward performance and billing.
77
116
78
117
> **How to Address**
79
118
80
-
- Use the `tiktoken` library to **pre-calculate token usage** before sending prompts.
81
-
- Normalize or simplify text during preprocessing (e.g., split compound words).
0 commit comments