fix: Enhance CosmosDBClient and DatabaseFactory for error handling and concurrency#398
Merged
Roopan-Microsoft merged 22 commits intomainfrom Apr 8, 2026
Merged
fix: Enhance CosmosDBClient and DatabaseFactory for error handling and concurrency#398Roopan-Microsoft merged 22 commits intomainfrom
Roopan-Microsoft merged 22 commits intomainfrom
Conversation
…ev to azd-template-validation
…schedule-codmod branch
…sl-weeklyschedule-codmod branch
…g and concurrency - Added asyncio support and a lock mechanism in DatabaseFactory to ensure thread safety. - Implemented retry logic with backoff for reading existing batches in CosmosDBClient. - Updated batch service to handle existing batch entries more efficiently.
…tch for existing batch records
…existing batch records
…le record handling in BatchService
…g navigation after 2 minutes
…f forcing navigation after 2 minutes" This reverts commit 79d5964.
fix: Handle Cosmos DB replication lag during concurrent batch creation
fix: Add aiProject dependency for cognitive service deployments
Contributor
There was a problem hiding this comment.
Pull request overview
This PR improves reliability and concurrency behavior around Cosmos DB access in the backend, while also updating generated ARM/Bicep artifacts to ensure correct deployment ordering (notably around AI Foundry dependencies).
Changes:
- Hardened Cosmos DB batch creation against 409 conflicts by reading existing records with retry/backoff, and added partition-key scoping to several queries.
- Made database initialization concurrency-safe by guarding
DatabaseFactory.get_database()with an async lock and caching the singleton instance. - Updated infra templates (generator metadata + resource dependencies) to align with the newer Bicep generator output and ensure correct deployment order.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/backend/common/database/cosmosdb.py |
Adds conflict read/retry logic for batch creation; uses partition keys in queries; adds existing_batch fast-path for updates. |
src/backend/common/database/database_factory.py |
Adds async locking + singleton caching to prevent concurrent double-initialization. |
src/backend/common/services/batch_service.py |
Avoids redundant DB reads when uploading files; threads existing_batch to reduce extra lookups. |
src/tests/backend/common/database/cosmosdb_test.py |
Updates mocks to account for read_item usage and partition-key kwargs in query mocks. |
infra/modules/ai-foundry/dependencies.bicep |
Adds explicit dependency ordering for cognitive service deployments relative to aiProject. |
infra/main.json |
Regenerates template metadata and updates resource dependencies to reflect new generator output and ordering needs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fix: fix copilot comments - enhance batch handling in CosmosDBClient and add tests for conflict scenarios
chore: Add AZD Template Validation Workflow (Scheduled & On-Demand) and Refactor Azure Deployment Pipeline
Roopan-Microsoft
approved these changes
Apr 8, 2026
|
🎉 This PR is included in version 1.7.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
This pull request introduces several improvements and fixes in both the infrastructure (
infra/main.json,infra/modules/ai-foundry/dependencies.bicep) and backend code (src/backend/common/database/cosmosdb.py,database_factory.py,batch_service.py). The main themes are enhanced reliability and correctness in CosmosDB operations, improved concurrency safety in database initialization, and infrastructure updates for compatibility and resource dependency management.Backend reliability and concurrency improvements:
create_batchto robustly retry and fetch existing records after a conflict, with backoff and error handling for cross-user conflicts.query_itemscalls in several methods (get_batch,get_file,get_batch_from_id) to improve query performance and correctness. [1] [2] [3]DatabaseFactory.get_database()by introducing an async lock to ensure the singleton instance is only created once, even under concurrent access. [1] [2]update_batch_entryto accept an optionalexisting_batchparameter, avoiding redundant database fetches if the batch is already available.Infrastructure and deployment updates:
infra/main.jsonfor compatibility and reproducibility with Bicep 0.42.1. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]infra/main.jsonanddependencies.bicepto ensure correct deployment order, particularly for AI project and private DNS zone dependencies. [1] [2] [3] [4] [5]Batch and file service improvements:
batch_service.pyto ensure consistent return types and to utilize the newexisting_batchparameter for efficiency. [1] [2]These changes collectively improve the robustness, performance, and maintainability of both the backend and infrastructure code.
Does this introduce a breaking change?
Golden Path Validation
Deployment Validation