Skip to content

Latest commit

 

History

History
394 lines (293 loc) · 27.5 KB

File metadata and controls

394 lines (293 loc) · 27.5 KB

Azure OpenAI: Tokenization & Cost Analysis

Costa Rica

GitHub brown9804

Last updated: 2025-04-22


Table of References (Click to expand)
Table of Contents (Click to expand)

Cost Analysis

Azure OpenAI Service provides several ways to track and analyze costs, including model usage and cost distribution.

Key Point Details
Cost Management and Analysis Use Cost Management + Billing in the Azure portal to:
  • Track costs associated with Azure OpenAI Service.
  • Analyze cost patterns and identify spending trends.
  • Set budgets and create alerts for overspending.
Cost Analysis for Azure OpenAI Service In the Cost analysis section of the Azure portal, you can:
  • Group costs by various attributes, such as resource group, location, or meter.
  • Filter costs to focus on specific resources or services.
  • Visualize costs over time to understand spending patterns.
Pricing Models Azure OpenAI Service supports different pricing models, including:
  • Standard (On-Demand): Pay only for the tokens processed.
  • Provisioned Throughput Units (PTUs): Ensure consistent throughput and minimal latency variance for scalable solutions.
Detailed Pricing Information The pricing details for Azure OpenAI Service are available on the Azure OpenAI Service pricing page. This includes costs per 1,000,000 tokens for various models and the rates for PTUs.
Monitoring Usage You can monitor the usage of different models and their associated costs. Azure provides metrics and logs that can help you understand the usage patterns of your deployed models.
Exporting Cost Data For more advanced analysis, you can export your cost data to a storage account. This allows you to use tools like Excel or Power BI for custom reporting and analysis.
Budgeting and Alerts You can create budgets in Azure Cost Management and set alerts to notify you when spending exceeds predefined thresholds.

Deployments type

Deployment Type Best Suited For How It Works Cost Tokenization
Global-Batch Offline scoring, non-latency sensitive workloads Asynchronous groups of requests with separate quota, 24-hour target turnaround Least expensive Tokens calculated based on input file and generated output
Global-Standard Recommended starting place for customers Traffic may be routed anywhere in the world Access to all new models with larger quota allocations, Global pricing Tokens calculated per request, including both input and output tokens
Global-Provisioned Real-time scoring for large consistent volume Traffic may be routed anywhere in the world Cost savings for consistent usage, Regional pricing Tokens calculated per request, including both input and output tokens
Standard Optimized for low to medium volume, real-time scoring for large consistent volume Traffic may be routed anywhere in the world Pay-per-call flexibility Tokens calculated per request, including both input and output tokens
Provisioned Use cases with data residency requirements Regional access with very high & predictable throughput Hourly billing with optional purchase of monthly or yearly reservations Tokens calculated per request, including both input and output tokens

Billing models

Billing Model Description Cost Calculation Use Cases
Pay-As-You-Go - Charged based on the number of tokens processed
- Suitable for variable or unpredictable usage patterns
- Cost per unit of tokens, different rates for different model series
- Includes both input and output tokens
- Applications with variable or unpredictable usage patterns
- Flexibility in usage
Provisioned Throughput Units - Reserved processing capacity for deployments
- Ensures predictable performance and cost
- Reserved Capacity: PTU provides reserved processing capacity for your deployments, ensuring predictable performance and cost.
- Capacity Planning: It's important to estimate the required PTUs for your workload to optimize performance and cost. The token totals are calculated using the following equation:

$$\text{Total Tokens} = \text{Peak calls per minute} \times (\text{Tokens in prompt call} + \text{Tokens in model response})$$
Click here for more explanation on how to calculate it
- Hourly rate based on the number of PTUs deployed
- Regardless of the number of tokens processed
- Well-defined, predictable throughput requirements
- Consistent traffic
- Real-time or latency-sensitive applications
- Cost Predictability: PTU offers more predictable costs compared to the pay-as-you-go model, which can vary based on usage.
- Performance Guarantees: PTU provides guaranteed throughput and latency constraints, which can be beneficial for real-time or latency-sensitive applications.

Tokenization

Tokenization is the process of breaking down text into smaller units called tokens. In Azure OpenAI Service, tokenization is used to calculate the number of tokens processed, which directly impacts the cost. Tokens are pieces of words. When processing text, Azure OpenAI Service breaks down the input into tokens, which can be as short as one character or as long as one word. These tokens are used by the model to understand and generate text.

Process:

  1. Text Splitting: The input text is split into tokens, which can be as short as one character or as long as one word. This splitting is done using a method called Byte-Pair Encoding (BPE).
  2. Byte-Pair Encoding (BPE): BPE is a tokenization method that merges the most frequently occurring pairs of characters into a single token. This approach is particularly effective in handling rare words and subwords, as it allows the model to break down complex or unseen words into more manageable pieces.
  3. Token Limits: Each model in Azure OpenAI has a maximum token limit per request. It's important to be aware of these limits when designing your prompts and handling responses.
  4. Tokenization Example: To illustrate, the word "hamburger" might be tokenized into ["ham", "bur", "ger"], while a common word like "pear" would remain as a single token ["pear"]. Many tokens also start with a whitespace, for example, " hello" and " bye".
Process Description
Input Tokenization The input text is tokenized into individual tokens. For example, the sentence "Hello, world!" would be tokenized into ["Hello", ",", " world", "!"].
Output Tokenization The model’s response is also tokenized. For example, if the model generates the text "Hi there!", it might be tokenized into ["Hi", " there", "!"].
Total Tokens Processed The total number of tokens processed is the sum of the input tokens and the output tokens.
Cost Calculation The cost is calculated based on the total number of tokens processed, using the pricing information provided by Azure.

The cost for using Azure OpenAI can be calculated using the following general formula:

$$\text{Cost} = \left( \frac{\text{Number of Tokens}}{N} \right) \times \text{Price per N Tokens}$$

Where:

  • Number of Tokens is the total number of tokens processed (input tokens + output tokens).
  • N is the number of tokens for which the price is specified (e.g., 100,000 tokens).
  • Price per N Tokens is the cost rate for processing N tokens, which varies based on the model and region.

Optimization: Best Practices Overview

Optimization Aspect Technical Details Best Practices
Input Optimization - Token Efficiency: Reducing the number of tokens in the input can significantly lower costs.
- Prompt Engineering: Crafting prompts that achieve the desired result with fewer tokens.
- Context Management: Including only the necessary context in the input to minimize token usage.
- Concise Prompts: Use the shortest possible prompts that still convey the necessary information.
- Avoid Redundancy: Remove any repetitive or unnecessary words from the prompts.
- Template Design: Design prompt templates that are efficient in terms of token usage.
Output Optimization - Response Length Control: Limiting the length of the model's response can help manage and predict costs.
- Stop Sequences: Using stop sequences to control where the model should stop generating further tokens.
- Max Tokens Parameter: Setting an appropriate limit on the number of tokens in the response.
- Set Max Tokens: Use the max_tokens parameter to limit the length of the model's response.
- Use Stop Sequences: Define stop sequences to control the verbosity of the output.
- Quality Check: Regularly review the model's responses to ensure they are within the expected length and quality.

Cost Tracking with API calls

Bruno: An open-source API client designed to simplify API testing and exploration. Bruno is known for its speed and Git-friendly approach, allowing users to store API collections directly on their filesystem using a plain text markup language called Bru. Click here to go download page
Azure API Management Developer Portal (API Playground) to track costs with the Azure Cost Management API.

Register an Application in Azure AD

  1. Go to the Azure portal.

  2. Navigate to Azure Entra ID > App registrations > New registration.

  3. Fill in the required details and register the application.

  4. Note down the clientId, clientSecret, and tenantId.

    image image image

Grant API Permissions

  1. In the Azure portal, go to Azure Entra ID > App registrations > Your App > API permissions.

  2. Add the necessary permissions, such as Consumption Billing.

  3. Click on Grant admin consent if required.

    image image image

Get an Access Token

Search for the tenant id in Entra ID, if you don't have it yet:

image image

Get your client secret or create one if needed:

image
  1. Use the OAuth 2.0 client credentials flow to get an access token.

    POST https://login.microsoftonline.com/{tenantId}/oauth2/token
    Content-Type: application/x-www-form-urlencoded
    
    grant_type=client_credentials
    &client_id={clientId}
    &client_secret={clientSecret}
    &resource=https://management.azure.com/
    image

    If you want to export the code template, you can use the code option:

    image
    curl --request POST \
      --url 'https://login.microsoftonline.com/subscription_id/oauth2/token?api-version=2019-10-01' \
      --header 'Content-Type: application/x-www-form-urlencoded' \
      --header 'Cookie: esctx=PAQABBwEAAADW6jl31mB3T7ugrWTT8pFezGSVIkuP4LyhO8IAul5sxYLpizsuZhmYFjqPjHxaz7bVTzOeuQjCKjjQF3J43TO21RwnzGUMJ-Acbq4cBahfBiXEXJxPa1R9Z1PttRbOXlmaik7EzCTGeYPSdngJbcKPzopMfHuKy3mb_R-AOkexDuijM7hy0dYYHZLQK7ZNr5wgAA; fpc=Ahs9ebTvJwpOrfeArkj6nvy8JwoaAgAAANhWtN4OAAAAcQFObAEAAABzV7TeDgAAAA; stsservicecookie=estsfd; x-ms-gateway-slice=estsfd' \
      --header 'content-type: application/x-www-form-urlencoded' \
      --data grant_type=client_credentials \
      --data client_id=tbd \
      --data client_secret=tbd \
      --data resource=https://management.azure.com/
  2. Use the Access Token in your API calls:

    Authorization: Bearer {accessToken}
  3. Make API Call:

Grant permission before using the access token, you need to assign the app created, you can find it by name:

image image

Retrieve the data and consider applying filters, check the following section for more information:

image
GET https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.Consumption/usageDetails?api-version=2019-10-01
Authorization: Bearer {accessToken}

Adding Filters

Note

General API Call Structure: Azure Cost Management APIs typically follow a RESTful structure. Here's a basic example of an API call to retrieve cost data:

GET https://management.azure.com/{scope}/providers/Microsoft.Consumption/usageDetails?api-version=2019-10-01

Key Components:

  1. Base URL: https://management.azure.com/
  2. Scope: This defines the level at which you want to retrieve cost data. It can be a subscription, resource group, or a specific resource. For example:
    • Subscription: /subscriptions/{subscriptionId}
    • Resource Group: /subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}
    • Resource: /subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/{resourceProviderNamespace}/{resourceType}/{resourceName}
  3. Resource Path: /providers/Microsoft.Consumption/usageDetails
  4. API Version: api-version=2019-10-01

Example API Call:
To get usage details for a specific subscription, your API call might look like this:

GET https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.Consumption/usageDetails?api-version=2019-10-01

You can add various filters to narrow down the data. Here’s a table of common filters:

Filter Description Example
Date Range Filter by a specific date range $filter=properties/usageStart ge '2023-01-01' and properties/usageEnd le '2023-01-31'
Resource Group Filter by resource group name $filter=properties/resourceGroup eq 'yourResourceGroupName'
Resource Type Filter by resource type $filter=properties/resourceType eq 'Microsoft.Compute/virtualMachines'
Meter Category Filter by meter category $filter=properties/meterCategory eq 'Virtual Machines'
Tag Filter by tag name and value $filter=tags/yourTagName eq 'yourTagValue'
Location Filter by resource location $filter=properties/location eq 'eastus'
Charge Type Filter by type of charge (e.g., usage, purchase) $filter=properties/chargeType eq 'Usage'
Invoice ID Filter by specific invoice ID $filter=properties/invoiceId eq 'yourInvoiceId'

Examples API Call with Filters

GET https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.Consumption/usageDetails?$filter=properties/usageStart ge '2023-01-01' and properties/usageEnd le '2023-01-31'&api-version=2019-10-01
GET https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.Consumption/usageDetails?$filter=properties/usageStart ge '2023-01-01' and properties/usageEnd le '2023-01-31' and properties/resourceGroup eq 'yourResourceGroupName' and tags/yourTagName eq 'yourTagValue'&api-version=2019-10-01
Authorization: Bearer {accessToken}

Example using Cognitive Services Filter:

image
curl --request GET \
  --url 'https://management.azure.com/subscriptions/%257Bsusbcription_id%257D/providers/Microsoft.Consumption/usageDetails?api-version=2019-10-01&%60%24filter=properties%2FmeterCategory%2520eq%2520%2527Cognitive%2520Services%2527%60' \
  --header 'Authorization: Bearer tbd'

Tagging Resources Demo

Tagging the Azure OpenAI Resource

  1. Log in to the Azure Portal: Go to portal.azure.com and log in with your Azure account.

  2. Navigate to the Azure OpenAI Resource:

    • In the left-hand menu, select Resource groups.

    • Select the resource group that contains your Azure OpenAI resource.

    • Click on the Azure OpenAI resource.

      image
  3. Add Tags to the Resource:

    • In the left-hand menu of the resource, select Tags.

    • Add tags to the resource. For example, you can add tags like Department-Marketing, Department-Sales, etc.

    • Click Apply.

      image

Using Tags in API Calls

Give the required roles to be able to call the model (Cognitivie Services User, Cognitive Services OpenAI User):

image image image

To ensure that API calls from different departments are tagged correctly, you can include the tags in the API requests. Here’s an example of how to do this:

  • Set Up the API Call:
    • Use the Azure OpenAI API to make requests.
    • Include the tags in the request headers or body as needed.

Important

To understand more about tags available, please click here

Example of general model call:

image image image
curl --request POST \
  --url 'https://azureopenaibrowntest.openai.azure.com/openai/deployments/gpt-4/chat/completions?api-version=2024-08-01-preview' \
  --header 'Content-Type: application/json' \
  --header 'api-key: {key_value}' \
  --header 'content-type: application/json' \
  --data '{
  "temperature": 0.7,
  "max_tokens": 200,
  "seed": 42,
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Tell me a story about how the universe?"
    }
  ]
}'

Note

To tag usage costs from different departments or areas, you can use the user parameter in your API requests. This parameter allows you to assign a unique identifier representing your end-user, which can help monitor and detect usage patterns across different departments

Example API Call with Tags:

image
 curl --request POST \
   --url 'https://azureopenaibrowntest.openai.azure.com/openai/deployments/gpt-4/chat/completions?api-version=2024-08-01-preview' \
   --header 'Content-Type: application/json' \
   --header 'api-key: {api_value_key}' \
   --header 'content-type: application/json' \
   --data '{
   "temperature": 0.7,
   "max_tokens": 200,
   "seed": 42,
   "user": "department-hr",
   "messages": [
     {
       "role": "system",
       "content": "You are a helpful assistant."
     },
     {
       "role": "user",
       "content": "Tell me a story about how the universe?"
     }
   ]
 }'

Generating Billing Reports Based on Tags

  1. Navigate to Cost Management + Billing: In the Azure portal, go to Cost Management + Billing.
  2. Cost Analysis:
    • Select Cost Analysis.
    • Use the Add filter option to filter costs by tags.
    • For example, filter by department-hr to see the costs associated with the Marketing department.
  3. Group by Tags:
    • Use the Group by option to group costs by tags.
    • This will allow you to see a breakdown of costs by tags.

Automating Tagging with Azure Policy

To ensure that all resources are tagged consistently, you can use Azure Policy to enforce tagging.

  1. Create a Tagging Policy:
    • Go to Azure Policy in the Azure portal.
    • Click on Definitions and then + Policy definition.
    • Create a policy definition that requires tags on resources.
  2. Assign the Policy:
    • Assign the policy to the subscription or resource group.
    • This will ensure that all new resources are tagged according to the policy.

Total Visitors

Visitor Count