You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 0_Azure/3_AzureAI/AIFoundry/demos/4_TruncationHandling.md
+92-3Lines changed: 92 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,15 +13,20 @@ Last updated: 2025-03-03
13
13
14
14
<details>
15
15
<summary><b>List of References</b> (Click to expand)</summary>
16
-
16
+
17
+
-[Chunk large documents for vector search solutions in Azure AI Search](https://learn.microsoft.com/en-us/azure/search/vector-search-how-to-chunk-documents)
18
+
-[What is Azure OpenAI in Azure AI Foundry Models?](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/overview)
19
+
-[Troubleshooting and best practices for Azure OpenAI On Your Data](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/on-your-data-best-practices)
17
20
18
21
</details>
19
22
20
23
21
24
<details>
22
25
<summary><b>Table of Contents</b> (Click to expand)</summary>
23
-
24
-
26
+
27
+
-[Overview](#overview)
28
+
-[How to resolve truncation issues](#how-to-resolve-truncation-issues)
| Token Budgeting | Use Azure Functions with `tiktoken` to pre-calculate token usage before inference | Prevents exceeding token limits and enables chunk-aware document processing |
119
+
| Semantic Chunking | Use Azure AI Search’s Document Layout or Text Split skillset for structure-aware chunking | Preserves logical boundaries and improves embedding and retrieval quality |
120
+
| Temperature Control | Configure temperature and `top_p` in Azure OpenAI deployment settings | Reduces verbosity and keeps completions within token budget |
121
+
| Output Constraints | Use `max_tokens`, `stop` sequences, and `top_p` in Azure OpenAI API calls | Ensures clean, bounded outputs and avoids mid-sentence truncation |
122
+
| Monitoring & Scaling | Use Azure Monitor, Log Analytics, and PTUs for throughput and cost control | Enables observability and resilience at enterprise scale |
123
+
124
+
<details>
125
+
<summary><b> Token Budgeting in Azure </b> (Click to expand)</summary>
126
+
127
+
> Azure OpenAI models like GPT-4-128k enforce strict token limits. Complex documents with nested logic or rare terms can tokenize inefficiently, leading to unexpected truncation. Use an `Azure Function or Logic App` with the `tiktoken` library to analyze and split documents into token-aware chunks before sending them to Azure OpenAI.
128
+
129
+
**How to Apply in Azure:**
130
+
131
+
- Deploy a lightweight Azure Function that:
132
+
- Accepts document input
133
+
- Uses `tiktoken` to count tokens
134
+
- Splits content into ≤3000-token chunks
135
+
- Returns chunks to Power Automate or Azure OpenAI for inference
136
+
137
+
**Monitoring:**
138
+
- Use Azure Monitor and Log Analytics to track:
139
+
-`tokens_used`
140
+
-`flowRunId`
141
+
-`request_uri`
142
+
- Visualize trends in Power BI to detect spikes or anomalies
143
+
144
+
</details>
145
+
146
+
<details>
147
+
<summary><b> Semantic Chunking with Azure AI Search </b> (Click to expand)</summary>
148
+
149
+
> Azure `AI Search` supports semantic chunking via built-in skills like `Document Layout and Text Split`. These tools preserve logical structure and improve retrieval quality for RAG pipelines. `Chunking is not just about staying under token limits—it also improves embedding quality and relevance scoring.` Click here to read more about [Chunk large documents for vector search solutions in Azure AI Search](https://learn.microsoft.com/en-us/azure/search/vector-search-how-to-chunk-documents)
<summary><b> Temperature & Output Control in Azure OpenAI </b> (Click to expand)</summary>
179
+
180
+
> High temperature values (e.g., 0.8–1.0) increase creativity but also verbosity, which can lead to token overflow. Lower values (e.g., 0.2–0.4) yield more concise, deterministic outputs. Combine temperature control with `top_p`, `stop` sequences, and `max_tokens` in your Azure OpenAI deployment or API call. Click here to read more about [What is Azure OpenAI in Azure AI Foundry Models?](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/overview)
181
+
182
+
**How to Apply in Azure:**
183
+
- In Azure OpenAI Studio or API:
184
+
```json
185
+
{
186
+
"temperature": 0.3,
187
+
"top_p": 0.9,
188
+
"max_tokens": 1500,
189
+
"stop": ["\n\n", "###", "END"]
190
+
}
191
+
```
192
+
193
+
- For stateless, high-throughput scenarios:
194
+
- Use Provisioned Throughput Units (PTUs) for predictable performance
195
+
- Monitor latency and token usage with Azure Monitor.
0 commit comments