Costa Rica
Last updated: 2025-04-22
Table of References (Click to expand)
- Azure OpenAI Service - Pricing
- Plan to manage costs for Azure OpenAI Service
- Chapter 9 - Cost Management and Optimization
- Pricing Update: Token Based Billing for Fine Tuning Training
- Tokenizer - tool from OpenAI
- Customize views in cost analysis
- Group and filter options in Cost analysis and budgets
- Manage costs with automation
- Group and allocate costs using tag inheritance
- Tags - List
- Azure OpenAI Service REST API preview reference, with available parameters
Table of Contents (Click to expand)
Azure OpenAI Service provides several ways to track and analyze costs, including model usage and cost distribution.
| Key Point | Details |
|---|---|
| Cost Management and Analysis | Use Cost Management + Billing in the Azure portal to:
|
| Cost Analysis for Azure OpenAI Service | In the Cost analysis section of the Azure portal, you can:
|
| Pricing Models | Azure OpenAI Service supports different pricing models, including:
|
| Detailed Pricing Information | The pricing details for Azure OpenAI Service are available on the Azure OpenAI Service pricing page. This includes costs per 1,000,000 tokens for various models and the rates for PTUs. |
| Monitoring Usage | You can monitor the usage of different models and their associated costs. Azure provides metrics and logs that can help you understand the usage patterns of your deployed models. |
| Exporting Cost Data | For more advanced analysis, you can export your cost data to a storage account. This allows you to use tools like Excel or Power BI for custom reporting and analysis. |
| Budgeting and Alerts | You can create budgets in Azure Cost Management and set alerts to notify you when spending exceeds predefined thresholds. |
| Deployment Type | Best Suited For | How It Works | Cost | Tokenization |
|---|---|---|---|---|
| Global-Batch | Offline scoring, non-latency sensitive workloads | Asynchronous groups of requests with separate quota, 24-hour target turnaround | Least expensive | Tokens calculated based on input file and generated output |
| Global-Standard | Recommended starting place for customers | Traffic may be routed anywhere in the world | Access to all new models with larger quota allocations, Global pricing | Tokens calculated per request, including both input and output tokens |
| Global-Provisioned | Real-time scoring for large consistent volume | Traffic may be routed anywhere in the world | Cost savings for consistent usage, Regional pricing | Tokens calculated per request, including both input and output tokens |
| Standard | Optimized for low to medium volume, real-time scoring for large consistent volume | Traffic may be routed anywhere in the world | Pay-per-call flexibility | Tokens calculated per request, including both input and output tokens |
| Provisioned | Use cases with data residency requirements | Regional access with very high & predictable throughput | Hourly billing with optional purchase of monthly or yearly reservations | Tokens calculated per request, including both input and output tokens |
| Billing Model | Description | Cost Calculation | Use Cases |
|---|---|---|---|
| Pay-As-You-Go | - Charged based on the number of tokens processed - Suitable for variable or unpredictable usage patterns |
- Cost per unit of tokens, different rates for different model series - Includes both input and output tokens |
- Applications with variable or unpredictable usage patterns - Flexibility in usage |
| Provisioned Throughput Units | - Reserved processing capacity for deployments - Ensures predictable performance and cost - Reserved Capacity: PTU provides reserved processing capacity for your deployments, ensuring predictable performance and cost. - Capacity Planning: It's important to estimate the required PTUs for your workload to optimize performance and cost. The token totals are calculated using the following equation: Click here for more explanation on how to calculate it |
- Hourly rate based on the number of PTUs deployed - Regardless of the number of tokens processed |
- Well-defined, predictable throughput requirements - Consistent traffic - Real-time or latency-sensitive applications - Cost Predictability: PTU offers more predictable costs compared to the pay-as-you-go model, which can vary based on usage. - Performance Guarantees: PTU provides guaranteed throughput and latency constraints, which can be beneficial for real-time or latency-sensitive applications. |
Tokenization is the process of breaking down text into smaller units called tokens. In Azure OpenAI Service, tokenization is used to calculate the number of tokens processed, which directly impacts the cost.
Tokens are pieces of words. When processing text, Azure OpenAI Service breaks down the input into tokens, whichcan be as short as one character or as long as one word. These tokens are used by the model to understand and generate text.
Process:
- Text Splitting: The input text is split into tokens, which can be as short as one character or as long as one word. This splitting is done using a method called Byte-Pair Encoding (BPE).
- Byte-Pair Encoding (BPE): BPE is a tokenization method that merges the most frequently occurring pairs of characters into a single token. This approach is particularly effective in handling rare words and subwords, as it allows the model to break down complex or unseen words into more manageable pieces.
- Token Limits: Each model in Azure OpenAI has a maximum token limit per request. It's important to be aware of these limits when designing your prompts and handling responses.
- Tokenization Example: To illustrate, the word "hamburger" might be tokenized into ["ham", "bur", "ger"], while a common word like "pear" would remain as a single token ["pear"]. Many tokens also start with a whitespace, for example, " hello" and " bye".
| Process | Description |
|---|---|
| Input Tokenization | The input text is tokenized into individual tokens. For example, the sentence "Hello, world!" would be tokenized into ["Hello", ",", " world", "!"]. |
| Output Tokenization | The model’s response is also tokenized. For example, if the model generates the text "Hi there!", it might be tokenized into ["Hi", " there", "!"]. |
| Total Tokens Processed | The total number of tokens processed is the sum of the input tokens and the output tokens. |
| Cost Calculation | The cost is calculated based on the total number of tokens processed, using the pricing information provided by Azure. |
The cost for using Azure OpenAI can be calculated using the following general formula:
Where:
- Number of Tokens is the total number of tokens processed (input tokens + output tokens).
- N is the number of tokens for which the price is specified (e.g., 100,000 tokens).
- Price per N Tokens is the cost rate for processing N tokens, which varies based on the model and region.
| Optimization Aspect | Technical Details | Best Practices |
|---|---|---|
| Input Optimization | - Token Efficiency: Reducing the number of tokens in the input can significantly lower costs. - Prompt Engineering: Crafting prompts that achieve the desired result with fewer tokens. - Context Management: Including only the necessary context in the input to minimize token usage. |
- Concise Prompts: Use the shortest possible prompts that still convey the necessary information. - Avoid Redundancy: Remove any repetitive or unnecessary words from the prompts. - Template Design: Design prompt templates that are efficient in terms of token usage. |
| Output Optimization | - Response Length Control: Limiting the length of the model's response can help manage and predict costs. - Stop Sequences: Using stop sequences to control where the model should stop generating further tokens. - Max Tokens Parameter: Setting an appropriate limit on the number of tokens in the response. |
- Set Max Tokens: Use the max_tokens parameter to limit the length of the model's response.- Use Stop Sequences: Define stop sequences to control the verbosity of the output. - Quality Check: Regularly review the model's responses to ensure they are within the expected length and quality. |
Bruno: An open-source API client designed to simplify API testing and exploration. Bruno is known for its speed and Git-friendly approach, allowing users to store API collections directly on their filesystem using a plain text markup language called Bru. Click here to go download page
Azure API Management Developer Portal (API Playground) to track costs with the Azure Cost Management API.
-
Go to the Azure portal.
-
Navigate to Azure Entra ID > App registrations > New registration.
-
Fill in the required details and register the application.
-
Note down the
clientId,clientSecret, andtenantId.
-
In the Azure portal, go to Azure Entra ID > App registrations > Your App > API permissions.
-
Add the necessary permissions, such as
Consumption Billing. -
Click on Grant admin consent if required.
Search for the tenant id in
Entra ID, if you don't have it yet:
Get your
client secretor create one if needed:
-
Use the OAuth 2.0 client credentials flow to get an access token.
POST https://login.microsoftonline.com/{tenantId}/oauth2/token Content-Type: application/x-www-form-urlencoded grant_type=client_credentials &client_id={clientId} &client_secret={clientSecret} &resource=https://management.azure.com/
If you want to export the code template, you can use the code option:
curl --request POST \ --url 'https://login.microsoftonline.com/subscription_id/oauth2/token?api-version=2019-10-01' \ --header 'Content-Type: application/x-www-form-urlencoded' \ --header 'Cookie: esctx=PAQABBwEAAADW6jl31mB3T7ugrWTT8pFezGSVIkuP4LyhO8IAul5sxYLpizsuZhmYFjqPjHxaz7bVTzOeuQjCKjjQF3J43TO21RwnzGUMJ-Acbq4cBahfBiXEXJxPa1R9Z1PttRbOXlmaik7EzCTGeYPSdngJbcKPzopMfHuKy3mb_R-AOkexDuijM7hy0dYYHZLQK7ZNr5wgAA; fpc=Ahs9ebTvJwpOrfeArkj6nvy8JwoaAgAAANhWtN4OAAAAcQFObAEAAABzV7TeDgAAAA; stsservicecookie=estsfd; x-ms-gateway-slice=estsfd' \ --header 'content-type: application/x-www-form-urlencoded' \ --data grant_type=client_credentials \ --data client_id=tbd \ --data client_secret=tbd \ --data resource=https://management.azure.com/
-
Use the Access Token in your API calls:
Authorization: Bearer {accessToken} -
Make API Call:
Grant permission before using the access token, you need to assign the app created, you can find it by name:
Retrieve the data and consider applying filters, check the following section for more information:
GET https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.Consumption/usageDetails?api-version=2019-10-01
Authorization: Bearer {accessToken}Note
General API Call Structure: Azure Cost Management APIs typically follow a RESTful structure. Here's a basic example of an API call to retrieve cost data:
GET https://management.azure.com/{scope}/providers/Microsoft.Consumption/usageDetails?api-version=2019-10-01Key Components:
- Base URL:
https://management.azure.com/ - Scope: This defines the level at which you want to retrieve cost data. It can be a subscription, resource group, or a specific resource. For example:
- Subscription:
/subscriptions/{subscriptionId} - Resource Group:
/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName} - Resource:
/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/{resourceProviderNamespace}/{resourceType}/{resourceName}
- Subscription:
- Resource Path:
/providers/Microsoft.Consumption/usageDetails - API Version:
api-version=2019-10-01
Example API Call:
To get usage details for a specific subscription, your API call might look like this:
GET https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.Consumption/usageDetails?api-version=2019-10-01You can add various filters to narrow down the data. Here’s a table of common filters:
| Filter | Description | Example |
|---|---|---|
| Date Range | Filter by a specific date range | $filter=properties/usageStart ge '2023-01-01' and properties/usageEnd le '2023-01-31' |
| Resource Group | Filter by resource group name | $filter=properties/resourceGroup eq 'yourResourceGroupName' |
| Resource Type | Filter by resource type | $filter=properties/resourceType eq 'Microsoft.Compute/virtualMachines' |
| Meter Category | Filter by meter category | $filter=properties/meterCategory eq 'Virtual Machines' |
| Tag | Filter by tag name and value | $filter=tags/yourTagName eq 'yourTagValue' |
| Location | Filter by resource location | $filter=properties/location eq 'eastus' |
| Charge Type | Filter by type of charge (e.g., usage, purchase) | $filter=properties/chargeType eq 'Usage' |
| Invoice ID | Filter by specific invoice ID | $filter=properties/invoiceId eq 'yourInvoiceId' |
Examples API Call with Filters
GET https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.Consumption/usageDetails?$filter=properties/usageStart ge '2023-01-01' and properties/usageEnd le '2023-01-31'&api-version=2019-10-01GET https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.Consumption/usageDetails?$filter=properties/usageStart ge '2023-01-01' and properties/usageEnd le '2023-01-31' and properties/resourceGroup eq 'yourResourceGroupName' and tags/yourTagName eq 'yourTagValue'&api-version=2019-10-01
Authorization: Bearer {accessToken}Example using Cognitive Services Filter:
curl --request GET \
--url 'https://management.azure.com/subscriptions/%257Bsusbcription_id%257D/providers/Microsoft.Consumption/usageDetails?api-version=2019-10-01&%60%24filter=properties%2FmeterCategory%2520eq%2520%2527Cognitive%2520Services%2527%60' \
--header 'Authorization: Bearer tbd'-
Log in to the Azure Portal: Go to portal.azure.com and log in with your Azure account.
-
Navigate to the Azure OpenAI Resource:
-
Add Tags to the Resource:
Give the required roles to be able to call the model
(Cognitivie Services User, Cognitive Services OpenAI User):
To ensure that API calls from different departments are tagged correctly, you can include the tags in the API requests. Here’s an example of how to do this:
- Set Up the API Call:
- Use the Azure OpenAI API to make requests.
- Include the tags in the request headers or body as needed.
Important
To understand more about tags available, please click here
Example of general model call:
curl --request POST \
--url 'https://azureopenaibrowntest.openai.azure.com/openai/deployments/gpt-4/chat/completions?api-version=2024-08-01-preview' \
--header 'Content-Type: application/json' \
--header 'api-key: {key_value}' \
--header 'content-type: application/json' \
--data '{
"temperature": 0.7,
"max_tokens": 200,
"seed": 42,
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Tell me a story about how the universe?"
}
]
}'Note
To tag usage costs from different departments or areas, you can use the user parameter in your API requests. This parameter allows you to assign a unique identifier representing your end-user, which can help monitor and detect usage patterns across different departments
Example API Call with Tags:
curl --request POST \
--url 'https://azureopenaibrowntest.openai.azure.com/openai/deployments/gpt-4/chat/completions?api-version=2024-08-01-preview' \
--header 'Content-Type: application/json' \
--header 'api-key: {api_value_key}' \
--header 'content-type: application/json' \
--data '{
"temperature": 0.7,
"max_tokens": 200,
"seed": 42,
"user": "department-hr",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Tell me a story about how the universe?"
}
]
}'- Navigate to Cost Management + Billing: In the Azure portal, go to Cost Management + Billing.
- Cost Analysis:
- Select Cost Analysis.
- Use the Add filter option to filter costs by tags.
- For example, filter by
department-hrto see the costs associated with the Marketing department.
- Group by Tags:
- Use the Group by option to group costs by tags.
- This will allow you to see a breakdown of costs by tags.
To ensure that all resources are tagged consistently, you can use Azure Policy to enforce tagging.
- Create a Tagging Policy:
- Go to Azure Policy in the Azure portal.
- Click on Definitions and then + Policy definition.
- Create a policy definition that requires tags on resources.
- Assign the Policy:
- Assign the policy to the subscription or resource group.
- This will ensure that all new resources are tagged according to the policy.

