Skip to content

Commit 98e1064

Browse files
authored
Enhance AKS documentation with traffic patterns and best practices
Added section on modern traffic patterns for AKS, detailing architecture and best practices for traffic management, observability, and failover automation.
1 parent ba6a0fe commit 98e1064

1 file changed

Lines changed: 57 additions & 1 deletion

File tree

0_Azure/8_AzureApps/demos/1_Compute/0_fromMulti-containerWebApp_toAKS.md

Lines changed: 57 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -251,6 +251,63 @@ From [Disk type comparison](https://learn.microsoft.com/en-us/azure/virtual-mach
251251
| **Infrastructure Components** | - **Container Image**: Reuse from Azure Container Registry (ACR); configure image pull secrets.<br>- **Networking**: Plan VNet, node pool subnet, Service CIDR, Pod CIDR.<br>- **Ingress / Routing**: Deploy Ingress Controller (NGINX or Azure Application Gateway), configure DNS and TLS.<br>- **Scaling**: Set up Horizontal Pod Autoscaler (HPA) or install KEDA manually.<br>- **Monitoring**: Enable Azure Monitor for Containers and Log Analytics.<br>- **Secrets Management**: Create Kubernetes Secrets for sensitive data.<br>- **Persistent Storage**: Define Persistent Volumes (PV), Persistent Volume Claims (PVC), and StatefulSets for stateful workloads.<br>- **Governance**: Apply Azure Policy and RBAC for cluster compliance. |
252252
| **Application Components** | - **App Code**: No major changes if already containerized; validate readiness for Kubernetes (health probes, resource limits).<br>- **Environment Variables**: Move to ConfigMaps (non-sensitive) and Secrets (sensitive).<br>- **Ingress Rules**: Create Kubernetes Ingress YAML for routing.<br>- **Autoscaling Policies**: Configure resource requests/limits and HPA/KEDA triggers.<br>- **Advanced Features**: Install Dapr for service invocation or event-driven patterns; configure GPU node pools for AI workloads.<br>- **Stateful Logic**: Update app to use persistent storage paths if needed.<br>- **Observability Hooks**: Ensure app exposes metrics for Prometheus/Azure Monitor integration. |
253253

254+
## Modern Traffic Patterns for AKS
255+
256+
> Strengthening API, Routing, and Global Traffic Strategy: `AKS introduces more moving parts, more flexibility, and more responsibility for traffic management, resiliency, and observability.`
257+
> - You now own the orchestration layer.
258+
> - You need stronger traffic management.
259+
> - You need more robust observability.
260+
> - You need better failover automation.
261+
> - You need global routing if you’re serving AI workloads.
262+
263+
264+
Your architecture evolves like this:
265+
266+
```
267+
┌──────────────────────┐ ┌──────────────────────────┐ ┌────────────────────────────┐ ┌──────────────────────────┐ ┌────────────────────────────────┐
268+
│ Azure Front Door │ → │ Azure API Management │ → │ AKS Ingress Controller │ → │ Kubernetes Services │ → │ Pods / Microservices / AI Models │
269+
│ - Global routing │ │ - Auth & rate limits │ │ - NGINX / AGIC / Traefik │ │ - Stable virtual IPs │ │ - Application containers │
270+
│ - WAF / DDoS │ │ - API abstraction layer │ │ - Path/host routing │ │ - Internal load balancing │ │ - AI model runtimes │
271+
└──────────────────────┘ └──────────────────────────┘ └────────────────────────────┘ └──────────────────────────┘ └────────────────────────────────┘
272+
```
273+
274+
<details>
275+
<summary><b>Click here to read more about it</b></summary>
276+
277+
1. **Azure Front Door (the global entry point):**
278+
- Everything starts at Azure Front Door. This is our global edge layer.
279+
- It gives us global routing, latency‑based load balancing, WAF, and DDoS protection.
280+
- Before traffic even reaches our APIs or our cluster, it’s already secured and optimized.
281+
2. **Azure API Management (the API control plane):**
282+
- After Front Door, traffic flows into APIM.
283+
- This is where we enforce authentication, rate limits, transformations, and API governance.
284+
- APIM abstracts the backend, so clients never know whether the workload runs on App Service, AKS, or something else.
285+
3. **AKS Ingress Controller (the cluster’s entry point):**
286+
- APIM forwards the request to the AKS Ingress Controller.
287+
- This component handles routing rules such as hostnames, paths, and TLS.
288+
- It decides which Kubernetes Service inside the cluster should receive the request.
289+
5. **Kubernetes Services (stable internal endpoints):**
290+
- Behind the Ingress, Kubernetes Services provide stable virtual IPs.
291+
- Even if pods scale up, down, or move between nodes, the Service endpoint stays the same.
292+
- This decouples traffic routing from the actual container instances.
293+
6. **Pods / Microservices / AI Models (the workloads):**
294+
- Finally, the request reaches the actual workloads running inside AKS.
295+
- These can be microservices, APIs, background workers, or AI model runtimes.
296+
- This is where the business logic executes and the response is generated.
297+
298+
</details>
299+
300+
301+
| Architecture Area | Detailed Explanation | Why It Still Applies with AKS | How It Integrates with AKS | Key Benefits | Risks If Ignored |
302+
|-------------------|----------------------|-------------------------------|-----------------------------|--------------|------------------|
303+
| **APIM as the Orchestrator** | - Acts as unified API gateway<br/>- Centralizes auth, rate limiting, transformations, versioning<br/>- Abstracts backend infrastructure from clients<br/>- Provides consistent API governance | - APIM hides backend changes (App Service → AKS)<br/>- Prevents exposing AKS directly<br/>- Maintains stable API contracts<br/>- Ensures consistent policies across services | - APIM forwards traffic to AKS via Ingress Controller<br/>- Backend URLs simply point to AKS Ingress endpoints<br/>- No client‑side changes required | - Stable API surface<br/>- Centralized governance<br/>- Security boundary<br/>- Backend abstraction<br/>- Developer‑friendly onboarding | - Clients tightly coupled to backend<br/>- AKS exposed directly<br/>- No centralized throttling or auth<br/>- Harder API lifecycle management |
304+
| **Azure Front Door** | - Global entry point for all external traffic<br/>- Provides WAF, DDoS protection, TLS termination<br/>- Offers global load balancing and latency‑based routing | - AKS is regional; Front Door provides global reach<br/>- Adds edge security before traffic hits APIM or AKS<br/>- Improves performance and resiliency | - Typical flow: Users → Front Door → APIM → AKS<br/>- Can also route directly to AKS Ingress if needed | - Global routing<br/>- Edge security<br/>- Faster TLS termination<br/>- Multi‑region failover<br/>- Performance acceleration | - No global resiliency<br/>- Higher latency for global users<br/>- No edge protection<br/>- Regional outages impact customers |
305+
| **Routing Model: <br/> Hub‑and‑Spoke** | - Central APIM hub<br/>- Regional AKS clusters as spokes<br/>- Front Door routes to APIM hub, APIM routes to regions | - Simplifies governance<br/>- Aligns with enterprise landing zones<br/>- Reduces operational overhead | - APIM hub forwards requests to regional AKS Ingress endpoints<br/>- Centralized policy enforcement | - Easy management<br/>- Centralized governance<br/>- Predictable routing<br/>- Lower operational complexity | - APIM hub becomes a bottleneck<br/>- Slightly higher latency for distant regions |
306+
| **Routing Model: <br/> Hub‑to‑Hub** | - Multiple APIM instances globally<br/>- Each APIM can route to others or local AKS clusters<br/>- Supports active‑active deployments | - AKS supports multi‑region active‑active patterns<br/>- Improves resiliency and performance for global users | - Each APIM instance routes to its nearest AKS cluster<br/>- Cross‑region APIM routing for failover | - High resiliency<br/>- Low latency for global users<br/>- True active‑active architecture | - Higher complexity<br/>- More governance overhead<br/>- Requires strong observability |
307+
| **Observability** | - Centralized logs and metrics<br/>- Tracks cluster health, pods, deployments<br/>- Monitors ingress traffic and service mesh telemetry<br/>- Captures application and container logs | - AKS adds more moving parts than App Service<br/>- Requires deeper visibility into cluster internals<br/>- Ensures reliability and performance | - Azure Monitor for Containers<br/>- Application Insights for app telemetry<br/>- Optional: Prometheus, Grafana, OpenTelemetry | - Full visibility<br/>- Faster troubleshooting<br/>- Better performance tuning<br/>- Stronger SRE practices | - Blind spots in cluster health<br/>- Harder debugging<br/>- Increased downtime risk<br/>- No insight into traffic patterns |
308+
| **Failover Automation** | - Automated detection and rerouting during outages<br/>- Uses health probes and multi‑region logic<br/>- Works across Front Door, APIM, and Kubernetes | - AKS gives more control over failover<br/>- Multi‑region clusters need automated routing<br/>- Ensures high availability | - Front Door handles global failover<br/>- APIM can route to alternate backends<br/>- Kubernetes uses readiness/liveness probes + HPA/KEDA | - High availability<br/>- Seamless failover<br/>- Reduced downtime<br/>- Better user experience | - Outages impact customers<br/>- Manual failover required<br/>- No resilience to regional failures |
309+
| **Future‑Proofing (AI & GPU Workloads)** | - Pre‑provision capacity in regions where new models appear first<br/>- GPU node pools take time to scale<br/>- AI workloads require specific SKUs (A100, H100, MI300X) | - AKS scaling is slower for GPU nodes<br/>- AI workloads need predictable capacity<br/>- Multi‑region deployments reduce risk | - Pre‑warm GPU node pools<br/>- Deploy clusters in multiple regions<br/>- Use Front Door to route to nearest available model | - Predictable performance<br/>- Faster model adoption<br/>- Reduced capacity shortages<br/>- Better global coverage | - GPU shortages<br/>- Slow scaling<br/>- Regional capacity constraints<br/>- Inability to deploy new AI models quickly |
310+
254311
## Best practices
255312

256313
- Cluster Size & Node Pools
@@ -270,7 +327,6 @@ From [Disk type comparison](https://learn.microsoft.com/en-us/azure/virtual-mach
270327

271328
> The application code rarely needs modification unless it relies on App Service‑specific features.
272329
273-
274330
2. How do I migrate my docker-compose setup to Kubernetes? `R/ Typical mapping:`
275331
- `services:` → Kubernetes Deployments + Services
276332
- `volumes:` → PersistentVolumeClaims

0 commit comments

Comments
 (0)