Skip to content

fix: Enhance CosmosDBClient and DatabaseFactory for error handling and concurrency#398

Merged
Roopan-Microsoft merged 22 commits intomainfrom
dev
Apr 8, 2026
Merged

fix: Enhance CosmosDBClient and DatabaseFactory for error handling and concurrency#398
Roopan-Microsoft merged 22 commits intomainfrom
dev

Conversation

@Pavan-Microsoft
Copy link
Copy Markdown
Contributor

Purpose

This pull request introduces several improvements and fixes in both the infrastructure (infra/main.json, infra/modules/ai-foundry/dependencies.bicep) and backend code (src/backend/common/database/cosmosdb.py, database_factory.py, batch_service.py). The main themes are enhanced reliability and correctness in CosmosDB operations, improved concurrency safety in database initialization, and infrastructure updates for compatibility and resource dependency management.

Backend reliability and concurrency improvements:

  • Improved handling of CosmosDB resource conflicts in create_batch to robustly retry and fetch existing records after a conflict, with backoff and error handling for cross-user conflicts.
  • Added partition key usage to CosmosDB query_items calls in several methods (get_batch, get_file, get_batch_from_id) to improve query performance and correctness. [1] [2] [3]
  • Enhanced concurrency safety in DatabaseFactory.get_database() by introducing an async lock to ensure the singleton instance is only created once, even under concurrent access. [1] [2]
  • Allowed update_batch_entry to accept an optional existing_batch parameter, avoiding redundant database fetches if the batch is already available.

Infrastructure and deployment updates:

  • Updated Bicep generator version and template hashes throughout infra/main.json for compatibility and reproducibility with Bicep 0.42.1. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]
  • Fixed and reordered resource dependencies in infra/main.json and dependencies.bicep to ensure correct deployment order, particularly for AI project and private DNS zone dependencies. [1] [2] [3] [4] [5]

Batch and file service improvements:

  • Adjusted file upload logic in batch_service.py to ensure consistent return types and to utilize the new existing_batch parameter for efficiency. [1] [2]

These changes collectively improve the robustness, performance, and maintainability of both the backend and infrastructure code.

Does this introduce a breaking change?

  • Yes
  • No

Golden Path Validation

  • I have tested the primary workflows (the "golden path") to ensure they function correctly without errors.

Deployment Validation

  • I have validated the deployment process successfully and all services are running as expected with this change.

VishalS-Microsoft and others added 19 commits March 31, 2026 12:33
…g and concurrency

- Added asyncio support and a lock mechanism in DatabaseFactory to ensure thread safety.
- Implemented retry logic with backoff for reading existing batches in CosmosDBClient.
- Updated batch service to handle existing batch entries more efficiently.
…f forcing navigation after 2 minutes"

This reverts commit 79d5964.
fix: Handle Cosmos DB replication lag during concurrent batch creation
fix: Add aiProject dependency for cognitive service deployments
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves reliability and concurrency behavior around Cosmos DB access in the backend, while also updating generated ARM/Bicep artifacts to ensure correct deployment ordering (notably around AI Foundry dependencies).

Changes:

  • Hardened Cosmos DB batch creation against 409 conflicts by reading existing records with retry/backoff, and added partition-key scoping to several queries.
  • Made database initialization concurrency-safe by guarding DatabaseFactory.get_database() with an async lock and caching the singleton instance.
  • Updated infra templates (generator metadata + resource dependencies) to align with the newer Bicep generator output and ensure correct deployment order.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/backend/common/database/cosmosdb.py Adds conflict read/retry logic for batch creation; uses partition keys in queries; adds existing_batch fast-path for updates.
src/backend/common/database/database_factory.py Adds async locking + singleton caching to prevent concurrent double-initialization.
src/backend/common/services/batch_service.py Avoids redundant DB reads when uploading files; threads existing_batch to reduce extra lookups.
src/tests/backend/common/database/cosmosdb_test.py Updates mocks to account for read_item usage and partition-key kwargs in query mocks.
infra/modules/ai-foundry/dependencies.bicep Adds explicit dependency ordering for cognitive service deployments relative to aiProject.
infra/main.json Regenerates template metadata and updates resource dependencies to reflect new generator output and ordering needs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/backend/common/database/cosmosdb.py Outdated
Comment thread src/backend/common/database/cosmosdb.py Outdated
Comment thread src/backend/common/services/batch_service.py Outdated
Comment thread src/tests/backend/common/database/cosmosdb_test.py
Comment thread src/tests/backend/common/database/cosmosdb_test.py
Pavan-Microsoft and others added 2 commits April 7, 2026 20:38
fix: fix copilot comments - enhance batch handling in CosmosDBClient and add tests for conflict scenarios
chore: Add AZD Template Validation Workflow (Scheduled & On-Demand) and Refactor Azure Deployment Pipeline
@Roopan-Microsoft Roopan-Microsoft merged commit 121791c into main Apr 8, 2026
9 checks passed
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 8, 2026

🎉 This PR is included in version 1.7.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants