Skip to content

perf: Implement retry logic to improve resiliency#147

Merged
Roopan-Microsoft merged 6 commits intodevfrom
psl-perf-enhancement
Jul 16, 2025
Merged

perf: Implement retry logic to improve resiliency#147
Roopan-Microsoft merged 6 commits intodevfrom
psl-perf-enhancement

Conversation

@Shreyas-Microsoft
Copy link
Copy Markdown
Collaborator

@Shreyas-Microsoft Shreyas-Microsoft commented Jun 25, 2025

Purpose

This pull request introduces significant updates to the CommsManager class and its integration into the convert_script.py workflow, focusing on enhanced retry logic, error handling, and communication robustness. Additionally, minor cleanup changes were made to remove unused imports. Below is a breakdown of the most important changes:

Enhancements to Communication and Retry Logic

  • Introduction of Retry Mechanism in CommsManager: Added retry logic with configurable parameters (max_retries, initial_delay, backoff_factor) to handle rate limits and transient errors in agent communication. The new async_invoke method ensures robust retries and error handling, including parsing wait times from error messages. (src/backend/sql_agents/helpers/comms_manager.py, src/backend/sql_agents/helpers/comms_manager.pyR156-R260)
  • Integration of CommsManager Retry Logic in convert_script.py: Replaced direct AgentGroupChat usage with CommsManager, leveraging the new retry mechanism for agent communication. This includes updating calls to add_chat_message and invoke methods. (src/backend/sql_agents/convert_script.py, [1] [2] [3]

Improved Error Handling

  • Safe Parsing of Agent Responses: Added fallback handling for malformed or incomplete JSON responses to prevent crashes during status updates. This ensures the system can gracefully handle invalid agent outputs. (src/backend/sql_agents/convert_script.py, src/backend/sql_agents/convert_script.pyR207-R242)
  • Critical Error Logging and Recovery: Enhanced error handling during agent communication failures, logging critical errors to the batch service and sending error status updates to clients. (src/backend/sql_agents/convert_script.py, src/backend/sql_agents/convert_script.pyR251-R276)

Code Cleanup

  • Removed Unused Imports: Cleaned up unused asyncio imports in convert_script.py and process_batch.py. (src/backend/sql_agents/convert_script.py, [1]; src/backend/sql_agents/process_batch.py, [2]

Does this introduce a breaking change?

  • Yes
  • No

Golden Path Validation

  • I have tested the primary workflows (the "golden path") to ensure they function correctly without errors.

Deployment Validation

  • I have validated the deployment process successfully and all services are running as expected with this change.

What to Check

Verify that the following are valid

  • ...

Other Information

@Shreyas-Microsoft Shreyas-Microsoft changed the title retry logic perf: Implement retry logic to improve resiliency Jun 25, 2025
Copy link
Copy Markdown
Contributor

@marktayl1 marktayl1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! If we have not, we need to put this through a very thorough test pass. There are a lot of core changes here so it is important to fully test, not just smoke testing. If that has been done, I am good if you pull the comments and merge.

Comment thread src/backend/sql_agents/convert_script.py Outdated
Comment thread src/backend/sql_agents/convert_script.py Outdated
@Shreyas-Microsoft Shreyas-Microsoft changed the base branch from main to dev July 14, 2025 11:55
@Roopan-Microsoft Roopan-Microsoft merged commit 9bd1828 into dev Jul 16, 2025
9 checks passed
@Roopan-Microsoft Roopan-Microsoft deleted the psl-perf-enhancement branch July 16, 2025 10:28
@github-actions
Copy link
Copy Markdown

🎉 This PR is included in version 1.5.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Ritesh-Microsoft pushed a commit that referenced this pull request Oct 10, 2025
perf: Implement retry logic to improve resiliency
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants