You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enhance AKS documentation with traffic patterns and best practices
Added section on modern traffic patterns for AKS, detailing architecture and best practices for traffic management, observability, and failover automation.
Copy file name to clipboardExpand all lines: 0_Azure/8_AzureApps/demos/1_Compute/0_fromMulti-containerWebApp_toAKS.md
+57-1Lines changed: 57 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -251,6 +251,63 @@ From [Disk type comparison](https://learn.microsoft.com/en-us/azure/virtual-mach
251
251
|**Infrastructure Components**| - **Container Image**: Reuse from Azure Container Registry (ACR); configure image pull secrets.<br>- **Networking**: Plan VNet, node pool subnet, Service CIDR, Pod CIDR.<br>- **Ingress / Routing**: Deploy Ingress Controller (NGINX or Azure Application Gateway), configure DNS and TLS.<br>- **Scaling**: Set up Horizontal Pod Autoscaler (HPA) or install KEDA manually.<br>- **Monitoring**: Enable Azure Monitor for Containers and Log Analytics.<br>- **Secrets Management**: Create Kubernetes Secrets for sensitive data.<br>- **Persistent Storage**: Define Persistent Volumes (PV), Persistent Volume Claims (PVC), and StatefulSets for stateful workloads.<br>- **Governance**: Apply Azure Policy and RBAC for cluster compliance. |
252
252
|**Application Components**| - **App Code**: No major changes if already containerized; validate readiness for Kubernetes (health probes, resource limits).<br>- **Environment Variables**: Move to ConfigMaps (non-sensitive) and Secrets (sensitive).<br>- **Ingress Rules**: Create Kubernetes Ingress YAML for routing.<br>- **Autoscaling Policies**: Configure resource requests/limits and HPA/KEDA triggers.<br>- **Advanced Features**: Install Dapr for service invocation or event-driven patterns; configure GPU node pools for AI workloads.<br>- **Stateful Logic**: Update app to use persistent storage paths if needed.<br>- **Observability Hooks**: Ensure app exposes metrics for Prometheus/Azure Monitor integration. |
253
253
254
+
## Modern Traffic Patterns for AKS
255
+
256
+
> Strengthening API, Routing, and Global Traffic Strategy: `AKS introduces more moving parts, more flexibility, and more responsibility for traffic management, resiliency, and observability.`
257
+
> - You now own the orchestration layer.
258
+
> - You need stronger traffic management.
259
+
> - You need more robust observability.
260
+
> - You need better failover automation.
261
+
> - You need global routing if you’re serving AI workloads.
|**APIM as the Orchestrator**| - Acts as unified API gateway<br/>- Centralizes auth, rate limiting, transformations, versioning<br/>- Abstracts backend infrastructure from clients<br/>- Provides consistent API governance | - APIM hides backend changes (App Service → AKS)<br/>- Prevents exposing AKS directly<br/>- Maintains stable API contracts<br/>- Ensures consistent policies across services | - APIM forwards traffic to AKS via Ingress Controller<br/>- Backend URLs simply point to AKS Ingress endpoints<br/>- No client‑side changes required | - Stable API surface<br/>- Centralized governance<br/>- Security boundary<br/>- Backend abstraction<br/>- Developer‑friendly onboarding | - Clients tightly coupled to backend<br/>- AKS exposed directly<br/>- No centralized throttling or auth<br/>- Harder API lifecycle management |
304
+
|**Azure Front Door**| - Global entry point for all external traffic<br/>- Provides WAF, DDoS protection, TLS termination<br/>- Offers global load balancing and latency‑based routing | - AKS is regional; Front Door provides global reach<br/>- Adds edge security before traffic hits APIM or AKS<br/>- Improves performance and resiliency | - Typical flow: Users → Front Door → APIM → AKS<br/>- Can also route directly to AKS Ingress if needed | - Global routing<br/>- Edge security<br/>- Faster TLS termination<br/>- Multi‑region failover<br/>- Performance acceleration | - No global resiliency<br/>- Higher latency for global users<br/>- No edge protection<br/>- Regional outages impact customers |
305
+
|**Routing Model: <br/> Hub‑and‑Spoke**| - Central APIM hub<br/>- Regional AKS clusters as spokes<br/>- Front Door routes to APIM hub, APIM routes to regions | - Simplifies governance<br/>- Aligns with enterprise landing zones<br/>- Reduces operational overhead | - APIM hub forwards requests to regional AKS Ingress endpoints<br/>- Centralized policy enforcement | - Easy management<br/>- Centralized governance<br/>- Predictable routing<br/>- Lower operational complexity | - APIM hub becomes a bottleneck<br/>- Slightly higher latency for distant regions |
306
+
|**Routing Model: <br/> Hub‑to‑Hub**| - Multiple APIM instances globally<br/>- Each APIM can route to others or local AKS clusters<br/>- Supports active‑active deployments | - AKS supports multi‑region active‑active patterns<br/>- Improves resiliency and performance for global users | - Each APIM instance routes to its nearest AKS cluster<br/>- Cross‑region APIM routing for failover | - High resiliency<br/>- Low latency for global users<br/>- True active‑active architecture | - Higher complexity<br/>- More governance overhead<br/>- Requires strong observability |
307
+
|**Observability**| - Centralized logs and metrics<br/>- Tracks cluster health, pods, deployments<br/>- Monitors ingress traffic and service mesh telemetry<br/>- Captures application and container logs | - AKS adds more moving parts than App Service<br/>- Requires deeper visibility into cluster internals<br/>- Ensures reliability and performance | - Azure Monitor for Containers<br/>- Application Insights for app telemetry<br/>- Optional: Prometheus, Grafana, OpenTelemetry | - Full visibility<br/>- Faster troubleshooting<br/>- Better performance tuning<br/>- Stronger SRE practices | - Blind spots in cluster health<br/>- Harder debugging<br/>- Increased downtime risk<br/>- No insight into traffic patterns |
308
+
|**Failover Automation**| - Automated detection and rerouting during outages<br/>- Uses health probes and multi‑region logic<br/>- Works across Front Door, APIM, and Kubernetes | - AKS gives more control over failover<br/>- Multi‑region clusters need automated routing<br/>- Ensures high availability | - Front Door handles global failover<br/>- APIM can route to alternate backends<br/>- Kubernetes uses readiness/liveness probes + HPA/KEDA | - High availability<br/>- Seamless failover<br/>- Reduced downtime<br/>- Better user experience | - Outages impact customers<br/>- Manual failover required<br/>- No resilience to regional failures |
309
+
|**Future‑Proofing (AI & GPU Workloads)**| - Pre‑provision capacity in regions where new models appear first<br/>- GPU node pools take time to scale<br/>- AI workloads require specific SKUs (A100, H100, MI300X) | - AKS scaling is slower for GPU nodes<br/>- AI workloads need predictable capacity<br/>- Multi‑region deployments reduce risk | - Pre‑warm GPU node pools<br/>- Deploy clusters in multiple regions<br/>- Use Front Door to route to nearest available model | - Predictable performance<br/>- Faster model adoption<br/>- Reduced capacity shortages<br/>- Better global coverage | - GPU shortages<br/>- Slow scaling<br/>- Regional capacity constraints<br/>- Inability to deploy new AI models quickly |
310
+
254
311
## Best practices
255
312
256
313
- Cluster Size & Node Pools
@@ -270,7 +327,6 @@ From [Disk type comparison](https://learn.microsoft.com/en-us/azure/virtual-mach
270
327
271
328
> The application code rarely needs modification unless it relies on App Service‑specific features.
272
329
273
-
274
330
2. How do I migrate my docker-compose setup to Kubernetes? `R/ Typical mapping:`
0 commit comments