overview

brown9804 · web-flow · commit 9747df3a33d3 · 2025-07-02T09:24:59.000-06:00
diff --git a/0_Azure/3_AzureAI/AIFoundry/demos/4_TruncationHandling.md b/0_Azure/3_AzureAI/AIFoundry/demos/4_TruncationHandling.md
@@ -0,0 +1,115 @@
+# Truncation Handling for Complex Documents - Overview 
+
+Costa Rica
+
+[![GitHub](https://img.shields.io/badge/--181717?logo=github&logoColor=ffffff)](https://github.com/)
+[brown9804](https://github.com/brown9804)
+
+Last updated: 2025-03-03
+
+------------------------------------------
+
+> Truncation often results from exceeding token limits or poor chunking strategies. Complex documents (those with conditional logic, sparse entities, or nested structures) can tokenize inefficiently.
+
+<details>
+<summary><b>List of References</b> (Click to expand)</summary>
+  
+ 
+</details>
+
+
+<details>
+<summary><b>Table of Contents</b> (Click to expand)</summary>
+  
+ 
+</details>
+
+
+## Overview 
+
+> Why Truncation Happens (Even in Shorter, Complex Documents)? <br/>
+> `Truncation happens when the total token count for your prompt and its output goes over the model’s limit. But here’s the catch: it’s not just about how long your prompt is, how complex it is can also bump up the token usage.`
+
+| Cause                        | Description                                                                                   | Why It Matters                                                                 |
+|-----------------------------|-----------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------|
+| Structural Complexity        | Conditional logic, nested clauses, or sparse named entities lead to inefficient tokenization | Increases token count unexpectedly, risking mid-sentence truncation             |
+| Tokenizer Behavior           | Azure OpenAI uses subword tokenization (e.g., `tiktoken`)                                     | Complex or rare words may consume more tokens than expected                     |
+| Verbose or Tangential Output| High temperature settings cause longer, less focused completions                             | May exceed token limits and truncate output mid-thought                         |
+
+<details>
+<summary><b> Structural Complexity </b> (Click to expand)</summary>
+  
+> Documents with **conditional logic**, **nested clauses**, or **sparse named entities** are structurally complex. These patterns confuse tokenizers because they lack clear semantic anchors (like names or dates) and often involve long, interdependent clauses.
+
+> E.g: `If the system fails to initialize, and the fallback protocol is not triggered unless the override is active, then the watchdog timer must be reset manually.`
+> This sentence, while not long, contains multiple conditions and dependencies. Tokenizers break it into many subword units, inflating the token count.
+
+> Why It Matter?
+
+- You may hit token limits even with seemingly short documents.
+- Truncation may occur mid-sentence or mid-logic, leading to incomplete or incoherent outputs.
+
+> How to Address?
+
+- Use **semantic chunking** to isolate logical units (e.g., one condition per chunk).
+- Preprocess documents to simplify or flatten nested logic where possible.
+
+</details>
+
+<details>
+<summary><b> Tokenizer Behavior </b> (Click to expand)</summary>
+
+> Azure OpenAI uses the same tokenizer as OpenAI, typically `tiktoken`. This tokenizer breaks text into **subword tokens**, not full words. For example:
+> - “Initialization” → `["Initial", "ization"]`  
+> - “FallbackProtocol” → `["Fallback", "Protocol"]`
+
+> Complex syntax, rare words, or compound identifiers (like in code or legal text) often result in more tokens per word than expected.
+
+> **Why It Matters**
+
+- Token count can balloon unexpectedly, even in short or medium-length documents.  
+- This can lead to premature truncation or rejection of prompts that exceed model limits.
+
+> **How to Address**
+
+- Use the `tiktoken` library to **pre-calculate token usage** before sending prompts.  
+- Normalize or simplify text during preprocessing (e.g., split compound words).  
+- Avoid overly technical phrasing unless necessary.
+
+</details>
+
+<details>
+<summary><b> Verbose or Tangential Output </b> (Click to expand)</summary>
+
+> The `temperature` parameter controls randomness in model output:
+> - **High temperature (0.8–1.0)** → creative, verbose, tangential  
+> - **Low temperature (0.2–0.4)** → focused, deterministic, concise
+
+> High temperature can cause the model to “ramble”, using more tokens than necessary and increasing the risk of hitting token limits.
+
+> **Why It Matters**
+
+- Verbose completions may exceed token budgets, especially in stateless or high-throughput scenarios.  
+- Truncation may occur mid-sentence or mid-thought, degrading output quality.
+
+> **How to Address**
+
+- For structured tasks (e.g., summarization, extraction), set:
+  ```json
+  {
+    "temperature": 0.2,
+    "top_p": 0.9
+  }
+  ```
+- Use `max_tokens` to cap output length.  
+- Define `stop` sequences to cut off output at logical boundaries (e.g., `["\n\n", "###"]`).
+
+</details>
+
+
+
+
+<div align="center">
+  <h3 style="color: #4CAF50;">Total Visitors</h3>
+  <img src="https://profile-counter.glitch.me/brown9804/count.svg" alt="Visitor Count" style="border: 2px solid #4CAF50; border-radius: 5px; padding: 5px;"/>
+</div>