Skip to content

Commit b0caca9

Browse files
authored
Rename demo markdown file to README.md
1 parent eb233bd commit b0caca9

1 file changed

Lines changed: 43 additions & 4 deletions

File tree

  • 0_Azure/3_AzureAI/AIFoundry/demos/10_PTU_pricing_estimate

0_Azure/3_AzureAI/AIFoundry/demos/10_PTU_pricing_estimate.md renamed to 0_Azure/3_AzureAI/AIFoundry/demos/10_PTU_pricing_estimate/README.md

Lines changed: 43 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -59,23 +59,62 @@ For example: [Azure Pricing Calculator](https://azure.microsoft.com/en-us/pricin
5959

6060
> If the specific model (LLM) is not available yet in the dropdown [Azure Pricing Calculator](https://azure.microsoft.com/en-us/pricing/calculator/), we can leverage rough estimate.
6161
62-
> [!NOTE]
63-
> Please review this chart as it provides a more accurate representation of the `Input TPM per PTU`:
62+
We need to calculate `Tokens per minute (TPM)`:
63+
64+
<img width="1897" height="721" alt="image" src="https://github.com/user-attachments/assets/996fe8b8-259d-494c-ab03-c24d156652b8" />
65+
66+
- **TPM (Tokens per Minute)**: Total tokens processed per minute.
67+
- **PTU (Provisioned Throughput Unit)**: A unit of capacity that governs how much TPM your deployment can handle.
68+
- **Token Weighting**: Output tokens are weighted more heavily than input tokens, based on model-specific ratios.
69+
70+
> [!IMPORTANT]
71+
> Model Specific Weighting Ratios, click here to read more about it: [How much throughput per PTU you get for each model](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding?view=foundry-classic#how-much-throughput-per-ptu-you-get-for-each-model)
72+
73+
<img width="600" alt="image" src="https://github.com/user-attachments/assets/e875d9fd-170c-4082-988f-362f067c6663" />
74+
75+
| Model | Output Token Weight | Meaning |
76+
|-----------|---------------------|-----------------------------------------|
77+
| GPT-5 | 1 output = 8 input | Output tokens count 8× more than input |
78+
| GPT-4.1 | 1 output = 4 input | Output tokens count 4× more than input |
79+
| Older GPT | Varies | Legacy ratios may differ |
6480

65-
> For example: `gpt-5-mini` \& `gpt-5`
81+
> For example:
82+
83+
- `Calls per minute = 10,000`
84+
- `Prompt tokens = 600`
85+
- `Response tokens = 100`
86+
- `Weight = 8` - `GPT-5`
87+
88+
> PTU Utilization Formula (`Input TPM per PTU`):
89+
90+
- **Step 1 – Input TPM**: `Input TPM = 10,000 × 600 = 6,000,000`
91+
- **Step 2 – Output TPM**: `Output TPM = 10,000 × 100 × 8 = 8,000,000`
92+
- **Step 3 – Effective TPM**: `Effective TPM = 6,000,000 + 8,000,000 = 14,000,000`
93+
94+
> [!NOTE]
95+
> Please review [this chart](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding?view=foundry-classic#latest-azure-openai-models) as it provides a more accurate representation of the `Input TPM per PTU`:
6696
6797
<div align="center">
6898
<img width="600" alt="image" src="https://github.com/user-attachments/assets/d771343c-2821-4834-a9ec-82c262cfbbdd" style="border: 2px solid #4CAF50; border-radius: 5px; padding: 5px;"/>
6999
</div>
70100

101+
- **Step 4 – Divide by Model Rate (tokens/min per PTU)**
102+
- Model rate for GPT‑5 = `23,750 tokens/min per PTU`
103+
- Calculation: `Input TPM per PTU = 14,000,000 ÷ 23,750 ≈ 589.47`
104+
- **Step 5 – Round to nearest whole PTU**: `Required PTUs ≈ 590`
105+
- **Step 6 – Apply Model Minimum Requirement**
106+
- Check the deployments require a minimum of `X PTUs`, since workloads are provisioned in whole PTUs, and rounding up is standard practice.
107+
108+
<img width="600" alt="image" src="https://github.com/user-attachments/assets/2efef880-4089-48b8-96eb-8e861ee09c61" />
109+
110+
- For example, final provisioning: `600 PTUs` (rounded up from 589.47 to meet minimum and capacity buffer)
71111

72112
## Provisioned Capacity Calculator
73113

74114
> Improve accuracy of your estimate by adding multiple workloads to your PTU calculation. Each workload will be calculated and displayed as well as the aggregate total if both are running at the same time to your deployment.
75115
76116
https://github.com/user-attachments/assets/31b5e2db-79dd-432f-a250-46227d551fcc
77117

78-
79118
<!-- START BADGE -->
80119
<div align="center">
81120
<img src="https://img.shields.io/badge/Total%20views-1633-limegreen" alt="Total views">

0 commit comments

Comments
 (0)