You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 0_Azure/3_AzureAI/AIFoundry/demos/10_PTU_pricing_estimate/README.md
+43-4Lines changed: 43 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -59,23 +59,62 @@ For example: [Azure Pricing Calculator](https://azure.microsoft.com/en-us/pricin
59
59
60
60
> If the specific model (LLM) is not available yet in the dropdown [Azure Pricing Calculator](https://azure.microsoft.com/en-us/pricing/calculator/), we can leverage rough estimate.
61
61
62
-
> [!NOTE]
63
-
> Please review this chart as it provides a more accurate representation of the `Input TPM per PTU`:
-**TPM (Tokens per Minute)**: Total tokens processed per minute.
67
+
-**PTU (Provisioned Throughput Unit)**: A unit of capacity that governs how much TPM your deployment can handle.
68
+
-**Token Weighting**: Output tokens are weighted more heavily than input tokens, based on model-specific ratios.
69
+
70
+
> [!IMPORTANT]
71
+
> Model Specific Weighting Ratios, click here to read more about it: [How much throughput per PTU you get for each model](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding?view=foundry-classic#how-much-throughput-per-ptu-you-get-for-each-model)
> Please review [this chart](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding?view=foundry-classic#latest-azure-openai-models) as it provides a more accurate representation of the `Input TPM per PTU`:
- For example, final provisioning: `600 PTUs` (rounded up from 589.47 to meet minimum and capacity buffer)
71
111
72
112
## Provisioned Capacity Calculator
73
113
74
114
> Improve accuracy of your estimate by adding multiple workloads to your PTU calculation. Each workload will be calculated and displayed as well as the aggregate total if both are running at the same time to your deployment.
0 commit comments