Azure OpenAI: Tokenization & Cost Analysis

Costa Rica

Last updated: 2025-04-22

Table of References (Click to expand)

Table of Contents (Click to expand)

Cost Analysis
Deployments type
Billing models
Tokenization
Optimization: Best Practices Overview
Cost Tracking with API calls
Tagging Resources Demo

Cost Analysis

Azure OpenAI Service provides several ways to track and analyze costs, including model usage and cost distribution.

Key Point	Details
Cost Management and Analysis	Use Cost Management + Billing in the Azure portal to: Track costs associated with Azure OpenAI Service. Analyze cost patterns and identify spending trends. Set budgets and create alerts for overspending.
Cost Analysis for Azure OpenAI Service	In the Cost analysis section of the Azure portal, you can: Group costs by various attributes, such as resource group, location, or meter. Filter costs to focus on specific resources or services. Visualize costs over time to understand spending patterns.
Pricing Models	Azure OpenAI Service supports different pricing models, including: Standard (On-Demand): Pay only for the tokens processed. Provisioned Throughput Units (PTUs): Ensure consistent throughput and minimal latency variance for scalable solutions.
Detailed Pricing Information	The pricing details for Azure OpenAI Service are available on the Azure OpenAI Service pricing page. This includes costs per 1,000,000 tokens for various models and the rates for PTUs.
Monitoring Usage	You can monitor the usage of different models and their associated costs. Azure provides metrics and logs that can help you understand the usage patterns of your deployed models.
Exporting Cost Data	For more advanced analysis, you can export your cost data to a storage account. This allows you to use tools like Excel or Power BI for custom reporting and analysis.
Budgeting and Alerts	You can create budgets in Azure Cost Management and set alerts to notify you when spending exceeds predefined thresholds.

Deployments type

Deployment Type	Best Suited For	How It Works	Cost	Tokenization
Global-Batch	Offline scoring, non-latency sensitive workloads	Asynchronous groups of requests with separate quota, 24-hour target turnaround	Least expensive	Tokens calculated based on input file and generated output
Global-Standard	Recommended starting place for customers	Traffic may be routed anywhere in the world	Access to all new models with larger quota allocations, Global pricing	Tokens calculated per request, including both input and output tokens
Global-Provisioned	Real-time scoring for large consistent volume	Traffic may be routed anywhere in the world	Cost savings for consistent usage, Regional pricing	Tokens calculated per request, including both input and output tokens
Standard	Optimized for low to medium volume, real-time scoring for large consistent volume	Traffic may be routed anywhere in the world	Pay-per-call flexibility	Tokens calculated per request, including both input and output tokens
Provisioned	Use cases with data residency requirements	Regional access with very high & predictable throughput	Hourly billing with optional purchase of monthly or yearly reservations	Tokens calculated per request, including both input and output tokens

Billing models

Billing Model	Description	Cost Calculation	Use Cases
Pay-As-You-Go	- Charged based on the number of tokens processed - Suitable for variable or unpredictable usage patterns	- Cost per unit of tokens, different rates for different model series - Includes both input and output tokens	- Applications with variable or unpredictable usage patterns - Flexibility in usage
Provisioned Throughput Units	- Reserved processing capacity for deployments - Ensures predictable performance and cost - Reserved Capacity: PTU provides reserved processing capacity for your deployments, ensuring predictable performance and cost. - Capacity Planning: It's important to estimate the required PTUs for your workload to optimize performance and cost. The token totals are calculated using the following equation: $$\text{Total Tokens} = \text{Peak calls per minute} \times (\text{Tokens in prompt call} + \text{Tokens in model response})$$ Click here for more explanation on how to calculate it	- Hourly rate based on the number of PTUs deployed - Regardless of the number of tokens processed	- Well-defined, predictable throughput requirements - Consistent traffic - Real-time or latency-sensitive applications - Cost Predictability: PTU offers more predictable costs compared to the pay-as-you-go model, which can vary based on usage. - Performance Guarantees: PTU provides guaranteed throughput and latency constraints, which can be beneficial for real-time or latency-sensitive applications.

Tokenization

Tokenization is the process of breaking down text into smaller units called tokens. In Azure OpenAI Service, tokenization is used to calculate the number of tokens processed, which directly impacts the cost. Tokens are pieces of words. When processing text, Azure OpenAI Service breaks down the input into tokens, which can be as short as one character or as long as one word. These tokens are used by the model to understand and generate text.

Process:

Text Splitting: The input text is split into tokens, which can be as short as one character or as long as one word. This splitting is done using a method called Byte-Pair Encoding (BPE).
Byte-Pair Encoding (BPE): BPE is a tokenization method that merges the most frequently occurring pairs of characters into a single token. This approach is particularly effective in handling rare words and subwords, as it allows the model to break down complex or unseen words into more manageable pieces.
Token Limits: Each model in Azure OpenAI has a maximum token limit per request. It's important to be aware of these limits when designing your prompts and handling responses.
Tokenization Example: To illustrate, the word "hamburger" might be tokenized into ["ham", "bur", "ger"], while a common word like "pear" would remain as a single token ["pear"]. Many tokens also start with a whitespace, for example, " hello" and " bye".

Process	Description
Input Tokenization	The input text is tokenized into individual tokens. For example, the sentence "Hello, world!" would be tokenized into `["Hello", ",", " world", "!"]`.
Output Tokenization	The model’s response is also tokenized. For example, if the model generates the text "Hi there!", it might be tokenized into `["Hi", " there", "!"]`.
Total Tokens Processed	The total number of tokens processed is the sum of the input tokens and the output tokens.
Cost Calculation	The cost is calculated based on the total number of tokens processed, using the pricing information provided by Azure.

The cost for using Azure OpenAI can be calculated using the following general formula:

$$\text{Cost} = \left( \frac{\text{Number of Tokens}}{N} \right) \times \text{Price per N Tokens}$$

Where:

Number of Tokens is the total number of tokens processed (input tokens + output tokens).
N is the number of tokens for which the price is specified (e.g., 100,000 tokens).
Price per N Tokens is the cost rate for processing N tokens, which varies based on the model and region.

Optimization: Best Practices Overview

Optimization Aspect	Technical Details	Best Practices
Input Optimization	- Token Efficiency: Reducing the number of tokens in the input can significantly lower costs. - Prompt Engineering: Crafting prompts that achieve the desired result with fewer tokens. - Context Management: Including only the necessary context in the input to minimize token usage.	- Concise Prompts: Use the shortest possible prompts that still convey the necessary information. - Avoid Redundancy: Remove any repetitive or unnecessary words from the prompts. - Template Design: Design prompt templates that are efficient in terms of token usage.
Output Optimization	- Response Length Control: Limiting the length of the model's response can help manage and predict costs. - Stop Sequences: Using stop sequences to control where the model should stop generating further tokens. - Max Tokens Parameter: Setting an appropriate limit on the number of tokens in the response.	- Set Max Tokens: Use the `max_tokens` parameter to limit the length of the model's response. - Use Stop Sequences: Define stop sequences to control the verbosity of the output. - Quality Check: Regularly review the model's responses to ensure they are within the expected length and quality.

Cost Tracking with API calls

Bruno: An open-source API client designed to simplify API testing and exploration. Bruno is known for its speed and Git-friendly approach, allowing users to store API collections directly on their filesystem using a plain text markup language called Bru. Click here to go download page
Azure API Management Developer Portal (API Playground) to track costs with the Azure Cost Management API.

Register an Application in Azure AD

Go to the Azure portal.
Navigate to Azure Entra ID > App registrations > New registration.
Fill in the required details and register the application.
Note down the clientId, clientSecret, and tenantId.

Grant API Permissions

In the Azure portal, go to Azure Entra ID > App registrations > Your App > API permissions.
Add the necessary permissions, such as Consumption Billing.
Click on Grant admin consent if required.

Get an Access Token

Search for the tenant id in Entra ID, if you don't have it yet:

Get your client secret or create one if needed:

Use the OAuth 2.0 client credentials flow to get an access token.

POST https://login.microsoftonline.com/{tenantId}/oauth2/token
Content-Type: application/x-www-form-urlencoded

grant_type=client_credentials
&client_id={clientId}
&client_secret={clientSecret}
&resource=https://management.azure.com/

If you want to export the code template, you can use the code option:

curl --request POST \
  --url 'https://login.microsoftonline.com/subscription_id/oauth2/token?api-version=2019-10-01' \
  --header 'Content-Type: application/x-www-form-urlencoded' \
  --header 'Cookie: esctx=PAQABBwEAAADW6jl31mB3T7ugrWTT8pFezGSVIkuP4LyhO8IAul5sxYLpizsuZhmYFjqPjHxaz7bVTzOeuQjCKjjQF3J43TO21RwnzGUMJ-Acbq4cBahfBiXEXJxPa1R9Z1PttRbOXlmaik7EzCTGeYPSdngJbcKPzopMfHuKy3mb_R-AOkexDuijM7hy0dYYHZLQK7ZNr5wgAA; fpc=Ahs9ebTvJwpOrfeArkj6nvy8JwoaAgAAANhWtN4OAAAAcQFObAEAAABzV7TeDgAAAA; stsservicecookie=estsfd; x-ms-gateway-slice=estsfd' \
  --header 'content-type: application/x-www-form-urlencoded' \
  --data grant_type=client_credentials \
  --data client_id=tbd \
  --data client_secret=tbd \
  --data resource=https://management.azure.com/

Use the Access Token in your API calls:
```
Authorization: Bearer {accessToken}
```
Make API Call:

Grant permission before using the access token, you need to assign the app created, you can find it by name:

Retrieve the data and consider applying filters, check the following section for more information:

GET https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.Consumption/usageDetails?api-version=2019-10-01
Authorization: Bearer {accessToken}

Adding Filters

Note

General API Call Structure: Azure Cost Management APIs typically follow a RESTful structure. Here's a basic example of an API call to retrieve cost data:

GET https://management.azure.com/{scope}/providers/Microsoft.Consumption/usageDetails?api-version=2019-10-01

Key Components:

Base URL: https://management.azure.com/
Scope: This defines the level at which you want to retrieve cost data. It can be a subscription, resource group, or a specific resource. For example:
- Subscription: /subscriptions/{subscriptionId}
- Resource Group: /subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}
- Resource: /subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/{resourceProviderNamespace}/{resourceType}/{resourceName}
Resource Path: /providers/Microsoft.Consumption/usageDetails
API Version: api-version=2019-10-01

Example API Call:
To get usage details for a specific subscription, your API call might look like this:

GET https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.Consumption/usageDetails?api-version=2019-10-01

You can add various filters to narrow down the data. Here’s a table of common filters:

Filter	Description	Example
Date Range	Filter by a specific date range	`$filter=properties/usageStart ge '2023-01-01' and properties/usageEnd le '2023-01-31'`
Resource Group	Filter by resource group name	`$filter=properties/resourceGroup eq 'yourResourceGroupName'`
Resource Type	Filter by resource type	`$filter=properties/resourceType eq 'Microsoft.Compute/virtualMachines'`
Meter Category	Filter by meter category	`$filter=properties/meterCategory eq 'Virtual Machines'`
Tag	Filter by tag name and value	`$filter=tags/yourTagName eq 'yourTagValue'`
Location	Filter by resource location	`$filter=properties/location eq 'eastus'`
Charge Type	Filter by type of charge (e.g., usage, purchase)	`$filter=properties/chargeType eq 'Usage'`
Invoice ID	Filter by specific invoice ID	`$filter=properties/invoiceId eq 'yourInvoiceId'`

Examples API Call with Filters

GET https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.Consumption/usageDetails?$filter=properties/usageStart ge '2023-01-01' and properties/usageEnd le '2023-01-31'&api-version=2019-10-01

GET https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.Consumption/usageDetails?$filter=properties/usageStart ge '2023-01-01' and properties/usageEnd le '2023-01-31' and properties/resourceGroup eq 'yourResourceGroupName' and tags/yourTagName eq 'yourTagValue'&api-version=2019-10-01
Authorization: Bearer {accessToken}

Example using Cognitive Services Filter:

curl --request GET \
  --url 'https://management.azure.com/subscriptions/%257Bsusbcription_id%257D/providers/Microsoft.Consumption/usageDetails?api-version=2019-10-01&%60%24filter=properties%2FmeterCategory%2520eq%2520%2527Cognitive%2520Services%2527%60' \
  --header 'Authorization: Bearer tbd'

Tagging Resources Demo

Tagging the Azure OpenAI Resource

Log in to the Azure Portal: Go to portal.azure.com and log in with your Azure account.
Navigate to the Azure OpenAI Resource:
- In the left-hand menu, select Resource groups.
- Select the resource group that contains your Azure OpenAI resource.
- Click on the Azure OpenAI resource.
Add Tags to the Resource:
- In the left-hand menu of the resource, select Tags.
- Add tags to the resource. For example, you can add tags like Department-Marketing, Department-Sales, etc.
- Click Apply.

Using Tags in API Calls

Give the required roles to be able to call the model (Cognitivie Services User, Cognitive Services OpenAI User):

To ensure that API calls from different departments are tagged correctly, you can include the tags in the API requests. Here’s an example of how to do this:

Set Up the API Call:
- Use the Azure OpenAI API to make requests.
- Include the tags in the request headers or body as needed.

Important

To understand more about tags available, please click here

Example of general model call:

curl --request POST \
  --url 'https://azureopenaibrowntest.openai.azure.com/openai/deployments/gpt-4/chat/completions?api-version=2024-08-01-preview' \
  --header 'Content-Type: application/json' \
  --header 'api-key: {key_value}' \
  --header 'content-type: application/json' \
  --data '{
  "temperature": 0.7,
  "max_tokens": 200,
  "seed": 42,
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Tell me a story about how the universe?"
    }
  ]
}'

Note

To tag usage costs from different departments or areas, you can use the user parameter in your API requests. This parameter allows you to assign a unique identifier representing your end-user, which can help monitor and detect usage patterns across different departments

Example API Call with Tags:

 curl --request POST \
   --url 'https://azureopenaibrowntest.openai.azure.com/openai/deployments/gpt-4/chat/completions?api-version=2024-08-01-preview' \
   --header 'Content-Type: application/json' \
   --header 'api-key: {api_value_key}' \
   --header 'content-type: application/json' \
   --data '{
   "temperature": 0.7,
   "max_tokens": 200,
   "seed": 42,
   "user": "department-hr",
   "messages": [
     {
       "role": "system",
       "content": "You are a helpful assistant."
     },
     {
       "role": "user",
       "content": "Tell me a story about how the universe?"
     }
   ]
 }'

Generating Billing Reports Based on Tags

Navigate to Cost Management + Billing: In the Azure portal, go to Cost Management + Billing.
Cost Analysis:
- Select Cost Analysis.
- Use the Add filter option to filter costs by tags.
- For example, filter by department-hr to see the costs associated with the Marketing department.
Group by Tags:
- Use the Group by option to group costs by tags.
- This will allow you to see a breakdown of costs by tags.

Automating Tagging with Azure Policy

To ensure that all resources are tagged consistently, you can use Azure Policy to enforce tagging.

Create a Tagging Policy:
- Go to Azure Policy in the Azure portal.
- Click on Definitions and then + Policy definition.
- Create a policy definition that requires tags on resources.
Assign the Policy:
- Assign the policy to the subscription or resource group.
- This will ensure that all new resources are tagged according to the policy.

Total Visitors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Azure OpenAI: Tokenization & Cost Analysis

Cost Analysis

Deployments type

Billing models

Tokenization

Optimization: Best Practices Overview

Cost Tracking with API calls

Register an Application in Azure AD

Grant API Permissions

Get an Access Token

Adding Filters

Tagging Resources Demo

Tagging the Azure OpenAI Resource

Using Tags in API Calls

Generating Billing Reports Based on Tags

Automating Tagging with Azure Policy

Total Visitors

FilesExpand file tree

7_TokenizationCostAnalysis.md

Latest commit

History

7_TokenizationCostAnalysis.md

File metadata and controls

Azure OpenAI: Tokenization & Cost Analysis

Cost Analysis

Deployments type

Billing models

Tokenization

Optimization: Best Practices Overview

Cost Tracking with API calls

Register an Application in Azure AD

Grant API Permissions

Get an Access Token

Adding Filters

Tagging Resources Demo

Tagging the Azure OpenAI Resource

Using Tags in API Calls

Generating Billing Reports Based on Tags

Automating Tagging with Azure Policy

Total Visitors