Skip to content

Commit 374ef73

Browse files
authored
Add overview for multi-region load balancing architecture
This document outlines a multi-region load balancing architecture for MSFT Foundry, detailing frontend and backend layers, APIM roles, and routing models.
1 parent 5ee82da commit 374ef73

1 file changed

Lines changed: 130 additions & 0 deletions

File tree

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# Multi-Region Load Balancing Architecture for MSFT Foundry - Overview
2+
3+
Costa Rica
4+
5+
[![GitHub](https://badgen.net/badge/icon/github?icon=github&label)](https://github.com)
6+
[![GitHub](https://img.shields.io/badge/--181717?logo=github&logoColor=ffffff)](https://github.com/)
7+
[brown9804](https://github.com/brown9804)
8+
9+
Last updated: 2026-01-22
10+
11+
----------
12+
13+
> [!TIP]
14+
> Leverages Azure API Management (APIM) as a unified gateway to orchestrate requests across multiple US regions where MSFT Foundry workloads are deployed.
15+
> By distributing traffic through APIM, customers can mitigate TPM (tokens per minute) limitations and protect against isolated regional outages,
16+
> while maintaining a consistent API surface for developers.
17+
18+
<details>
19+
<summary><b>List of References</b> (Click to expand)</summary>
20+
21+
- [What is Azure API Management?](https://learn.microsoft.com/en-us/azure/api-management/api-management-key-concepts)
22+
- [What is Azure Front Door?](https://learn.microsoft.com/en-us/azure/frontdoor/front-door-overview)
23+
- [Comparison between Azure Front Door and Azure CDN services](https://learn.microsoft.com/en-us/azure/frontdoor/front-door-cdn-comparison)
24+
- [GPT-RAG Solution Accelerator](https://github.com/Azure/GPT-RAG) - AI Factory
25+
26+
</details>
27+
28+
29+
> To enhance resiliency and scalability, the solution can be split into frontend and backend layers: `This design ensures high availability, performance optimization, and simplified management, while preparing teams for new model rollouts that may initially be limited to specific regions.`
30+
31+
1. Frontend Layer:
32+
- `Azure Front Door` provides global entry points, latency-based routing, and DDoS protection.
33+
- `Microsoft Entra ID` secures authentication and authorization for user access.
34+
2. Backend Layer:
35+
- `Regional APIM instances and load balancers` distribute workloads across MSFT Foundry deployments.
36+
- Architectures can follow either a `hub-and-spoke model` (centralized control with spokes in each region) or a `hub-to-hub model` (interconnected hubs for regional autonomy).
37+
38+
| Approach | Structure | Strengths | Trade-offs |
39+
|----------|-----------|-----------|------------|
40+
| **Hub-to-Hub** | Multiple hubs interconnected | High resiliency, regional autonomy | More complex networking, higher cost |
41+
| **Hub-and-Spoke** | One central hub, multiple spokes | Easier to manage, centralized policies | Hub becomes a critical dependency |
42+
43+
## Unified Gateway with APIM
44+
45+
`Applications only call APIM endpoints, not individual Foundry instances. This simplifies SDKs and client logic.`
46+
47+
- Role of APIM: APIM sits at the edge of each region, exposing a consistent API surface to developers. It abstracts away the complexity of multiple Foundry deployments.
48+
- Policies: Developers can configure APIM policies for:
49+
- Rate limiting: Prevent exceeding TPM quotas per region.
50+
- Conditional routing: Direct traffic based on headers, tokens, or quota status.
51+
- Transformation: Normalize requests/responses across regions.
52+
53+
<img width="1620" height="1098" alt="image" src="https://github.com/user-attachments/assets/58f8f967-0334-4174-8ea4-1b6e6b5b6cf8" />
54+
55+
From [What is Azure API Management?](https://learn.microsoft.com/en-us/azure/api-management/api-management-key-concepts)
56+
57+
> API Management components:
58+
59+
<img width="783" height="400" alt="image" src="https://github.com/user-attachments/assets/e11b03c9-e36c-4502-bc34-272b2b549026" />
60+
61+
From [What is Azure API Management?](https://learn.microsoft.com/en-us/azure/api-management/api-management-key-concepts)
62+
63+
> E.g from [GPT-RAG Solution Accelerator](https://github.com/Azure/GPT-RAG)
64+
65+
<img width="1407" height="860" alt="image" src="https://github.com/user-attachments/assets/0ba7a045-b7e1-4297-b690-d3e87d74532d" />
66+
67+
## Frontend Layer
68+
69+
`Frontend ensures that user traffic is secure and optimized before hitting backend workloads. You don’t need to hardcode region logic in the client, Front Door handles it`
70+
71+
- Azure Front Door:
72+
- Provides global entry points with latency‑based routing.
73+
- Uses health probes to detect regional failures and reroute traffic.
74+
- Supports caching for static responses, reducing load on Foundry.
75+
- Microsoft Entra ID:
76+
- Handles OAuth2/OpenID Connect flows for secure access.
77+
- Issues tokens that APIM validates before forwarding requests.
78+
- Developers integrate Entra ID into client apps (web/mobile) for seamless authentication.
79+
80+
<img width="500" height="530" alt="image" src="https://github.com/user-attachments/assets/33e45a0e-ea29-4f16-b53e-d93586c61007" />
81+
82+
From [What is Azure Front Door?](https://learn.microsoft.com/en-us/azure/frontdoor/front-door-overview)
83+
84+
<img width="1200" height="581" alt="image" src="https://github.com/user-attachments/assets/bcf03970-950f-4a6a-b945-f7c6d99d3820" />
85+
86+
From [Comparison between Azure Front Door and Azure CDN services](https://learn.microsoft.com/en-us/azure/frontdoor/front-door-cdn-comparison)
87+
88+
> E.g Front Door + PE:
89+
90+
<img width="1518" height="698" alt="image" src="https://github.com/user-attachments/assets/a4c822b1-8bc1-4346-beb8-613141fb8f84" />
91+
92+
## Backend Layer
93+
94+
> [!TIP]
95+
> `Azure Monitor + Application Insights provide real‑time metrics on TPM consumption, latency, and error rates.`
96+
> - APIM policies track TPM usage.
97+
> - When nearing limits, traffic is rerouted to another region.
98+
> - Developers implement retry logic with exponential backoff to handle throttling gracefully.
99+
100+
- Regional APIM Instances: Each region has its own APIM connected to local Foundry deployments. Developers can configure per‑region policies (e.g., TPM quotas, logging).
101+
- Load Balancers:
102+
- Azure Load Balancer or Application Gateway distribute traffic across multiple Foundry instances in a region.
103+
- Health probes detect unhealthy instances and remove them from rotation.
104+
- Routing Models:
105+
- Hub‑and‑Spoke: Central hub routes traffic to spokes (regional APIM + Foundry). Easier to manage, but hub is a dependency.
106+
- Hub‑to‑Hub: Each hub can route to others, providing regional autonomy. More resilient but complex networking.
107+
108+
| Dimension | **Hub‑and‑Spoke** | **Hub‑to‑Hub** |
109+
|-----------|-------------------|----------------|
110+
| **Topology** | One **central hub VNet** peered with multiple **spoke VNets**. All traffic flows through the hub before reaching regional APIM + Foundry. | Multiple **regional hubs**, each with its own APIM + Foundry. Hubs are interconnected via VNet peering or Azure Virtual WAN, allowing direct routing between hubs. |
111+
| **Traffic Flow** | User → Front Door → Hub APIM → Spoke APIM/Foundry → Response. Centralized routing logic. | User → Front Door → Nearest Hub APIM → Local Foundry OR rerouted to another hub if local capacity is exceeded/outage. |
112+
| **Control Plane** | Centralized: policies, quotas, and routing rules defined at the hub APIM. Developers manage one control point. | Distributed: each hub maintains its own APIM policies, quotas, and routing rules. Developers must coordinate across hubs. |
113+
| **Resiliency** | Hub is a **single point of dependency**. If hub fails, spokes are unreachable unless fallback is designed. | No single dependency. Each hub can operate independently. Outages in one hub don’t affect others. |
114+
| **Complexity** | Easier to design and manage. Clear separation of responsibilities. | Higher complexity: requires consistent policy replication, routing synchronization, and monitoring across hubs. |
115+
| **Latency** | Potentially higher latency: traffic always traverses hub before reaching spoke. | Lower latency: traffic can terminate at nearest hub without detouring through a central hub. |
116+
| **Networking Requirements** | Hub VNet must be sized for aggregate traffic. Requires hub‑spoke peering and routing tables. | Requires **full mesh peering** or Virtual WAN. More complex routing tables and potential overlapping IP address challenges. |
117+
| **Security** | Centralized firewall and NSGs at hub enforce security. Easier to audit and control. | Security distributed across hubs. Each hub must replicate firewall rules and NSGs consistently. |
118+
| **Quota Management (TPM)** | Hub monitors TPM usage across spokes. Easier to implement quota‑aware routing logic. | Each hub monitors its own TPM usage. Requires cross‑hub coordination to balance workloads. |
119+
| **Failover Strategy** | Failover logic must be implemented at hub level. If hub is down, fallback requires secondary hub or alternate entry point. | Failover is local to each hub. Front Door can reroute traffic to another hub automatically. |
120+
| **Observability** | Centralized logging and monitoring at hub. Easier to correlate traffic flows. | Distributed logging. Requires aggregation across hubs for full visibility. |
121+
| **DevOps Impact** | Single CI/CD pipeline for hub APIM policies. Spokes are simpler (mostly Foundry + load balancers). | Multiple CI/CD pipelines for each hub APIM. Requires automation to ensure consistency. |
122+
| **Cost Considerations** | Lower operational cost: fewer APIM instances, centralized infrastructure. | Higher cost: multiple hubs with APIM, monitoring, and networking overhead. |
123+
| **Best Use Case** | Organizations prioritizing **simplicity and centralized control**. Suitable for small/medium deployments. | Organizations prioritizing **resiliency, autonomy, and low latency**. Suitable for large, distributed deployments. |
124+
125+
<!-- START BADGE -->
126+
<div align="center">
127+
<img src="https://img.shields.io/badge/Total%20views-1497-limegreen" alt="Total views">
128+
<p>Refresh Date: 2026-01-05</p>
129+
</div>
130+
<!-- END BADGE -->

0 commit comments

Comments
 (0)