Skip to content

Commit 5007782

Browse files
committed
initial publish
1 parent e61f5bf commit 5007782

63 files changed

Lines changed: 7241 additions & 48 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CODE_OF_CONDUCT.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@ Resources:
66

77
- [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/)
88
- [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
9-
- Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns
9+
- Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,4 +18,4 @@
1818
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
1919
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
2020
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21-
SOFTWARE
21+
SOFTWARE

README.md

Lines changed: 120 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,129 @@
1-
# Project
1+
# Content Processing Solution Accelerator
22

3-
> This repo has been populated by an initial template to help get you started. Please
4-
> make sure to update the content to build a great experience for community-building.
3+
MENU: [**USER STORY**](#user-story) \| [**QUICK DEPLOY**](#quick-deploy) \| [**SUPPORTING DOCUMENTATION**](#supporting-documentation)
54

6-
As the maintainer of this project, please make a few updates:
5+
<h2><img src="./docs/Images/ReadMe/userStory.png" width="64">
6+
<br/>
7+
User story
8+
</h2>
79

8-
- Improving this README.MD file to provide a great experience
9-
- Updating SUPPORT.MD with content about this project's support experience
10-
- Understanding the security reporting process in SECURITY.MD
11-
- Remove this section from the README
10+
### Overview
1211

13-
## Contributing
12+
This solution accelerator enables customers to programmatically extract data and apply schemas to unstructured documents across text-based and multi-modal content. During processing, extraction and data schema transformation - these steps are scored for accuracy to automate processing and identify as-needed human validation. This allows for improved accuracy and greater speed for data integration into downstream systems.
1413

15-
This project welcomes contributions and suggestions. Most contributions require you to agree to a
16-
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
17-
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
14+
It leverages Azure AI Foundry, Azure AI Content Understanding, Azure OpenAI Service, Azure blob storage, and Cosmos DB to transform large volumes of unstructured content through event-driven processing pipelines for integration into downstream applications and post-processing activities.
1815

19-
When you submit a pull request, a CLA bot will automatically determine whether you need to provide
20-
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
21-
provided by the bot. You will only need to do this once across all repos using our CLA.
2216

23-
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
24-
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
25-
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
17+
### Technical key features
2618

27-
## Trademarks
19+
- **Multi-modal content processing:** Utilizes machine learning-based OCR for efficient text extraction and integrates GPT Vision for processing various content formats.​
2820

29-
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
30-
trademarks or logos is subject to and must follow
31-
[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
32-
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
33-
Any use of third-party trademarks or logos are subject to those third-party's policies.
21+
- **Schema-based data transformation:** Maps extracted content to custom or industry-defined schemas and outputs as JSON for interoperability.​
22+
23+
- **Confidence scoring:** Calculation of entity extraction and schema mapping processes for accuracy, providing scores to drive manual human-in-the-loop review, if desired.
24+
25+
- **Review, validate, update:** Transparency in reviewing processing steps and final output - allowing for review, comparison to source asset, ability to modify output results, and annotation for historical reference.
26+
27+
- **API driven processing pipelines:** API end-points are available for external source systems to integrate event-driven processing workflows.
28+
29+
<br/>
30+
31+
Below is an image of the solution accelerator:
32+
33+
![image](./docs/Images/ReadMe/ui.png)
34+
35+
### Use case / scenario
36+
37+
A data analyst at a property insurance company manages and ensures claims for data accuracy and compliance.
38+
39+
A recent natural disaster has led to an influx of insurance claims coming into the pipeline. The analyst is tasked with accurately validating ingested data from claims and invoices being processed through the system. Claims data includes various multi-modal content types, with details extracted and mapped to defined schemas such as policy plans, invoices, and insurance adjuster reports.
40+
41+
AI is used to extract, transform, and flag potential discrepancies, such as missing policyholder details and outlier repair estimates. The data analyst then cross-checks the findings against historical claims data and regulatory guidelines. Collaborating with the compliance team, she verifies the flagged issues and refines the dataset.
42+
43+
Thanks to AI pipeline processing, data moves much faster, more accurately, and is more seamlessly integrated into the data analyst's workflow.
44+
45+
The sample data used in this repository is synthetic and generated using Azure OpenAI service. The data is intended for use as sample data only.
46+
47+
### Solution architecture
48+
![image](./docs/Images/ReadMe/solution-architecture.png)
49+
50+
51+
<h2><img src="./docs/Images/ReadMe/quickDeploy.png" width="64">
52+
<br/>
53+
QUICK DEPLOY
54+
</h2>
55+
56+
Follow the [quick deploy steps on the deployment guide](./docs/DeploymentGuide.md) to deploy this solution to your own Azure subscription.
57+
58+
| [![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/microsoft/content-processing-solution-accelerator) | [![Open in Dev Containers](https://img.shields.io/static/v1?style=for-the-badge&label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/microsoft/content-processing-solution-accelerator) | [![Deploy to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2Fmicrosoft%2Fcontent-processing-solution-accelerator%2Fmain%2Finfra%2Fmain.json) |
59+
|---|---|---|
60+
61+
62+
<br/>
63+
64+
<h2>
65+
Supporting Documentation
66+
</h2>
67+
68+
### Costs
69+
70+
Pricing varies per region and usage, so it isn't possible to predict exact costs for your usage.
71+
The majority of the Azure resources used in this infrastructure are on usage-based pricing tiers.
72+
However, Azure Container Registry has a fixed cost per registry per day.
73+
74+
You can try the [Azure pricing calculator](https://azure.microsoft.com/en-us/pricing/calculator) for the resources:
75+
76+
* Azure AI Foundry: Free tier. [Pricing](https://azure.microsoft.com/pricing/details/ai-studio/)
77+
* Azure Storage Account for AI Foundry: Standard tier, LRS. Pricing is based on storage and operations. [Pricing](https://azure.microsoft.com/pricing/details/storage/blobs/)
78+
* Azure Key Vault: Standard tier. Pricing is based on the number of operations. [Pricing](https://azure.microsoft.com/pricing/details/key-vault/)
79+
* Azure Storage Account for Content Processing Application: Standard tier, LRS. Pricing is based on storage and operations. [Pricing](https://azure.microsoft.com/pricing/details/storage/blobs/)
80+
* Azure AI Services: S0 tier, defaults to gpt-4o-mini. Pricing is based on token count. [Pricing](https://azure.microsoft.com/pricing/details/cognitive-services/)
81+
* Azure Container App: Consumption tier with 4 CPU, 8GiB memory/storage. Pricing is based on resource allocation, and each month allows for a certain amount of free usage. [Pricing](https://azure.microsoft.com/pricing/details/container-apps/)
82+
* Azure Container Registry: Basic tier. [Pricing](https://azure.microsoft.com/pricing/details/container-registry/)
83+
* Log analytics: Pay-as-you-go tier. Costs based on data ingested. [Pricing](https://azure.microsoft.com/pricing/details/monitor/)
84+
* Azure Cosmos DB: [Pricing](https://azure.microsoft.com/en-us/pricing/details/cosmos-db/autoscale-provisioned/)
85+
86+
87+
> ⚠️ To avoid unnecessary costs, remember to take down your app if it's no longer in use,
88+
either by deleting the resource group in the Portal or running `azd down`.
89+
90+
### Security guidelines
91+
92+
This template uses Azure Key Vault to store all connections to communicate between resources.
93+
94+
This template also uses [Managed Identity](https://learn.microsoft.com/entra/identity/managed-identities-azure-resources/overview) for local development and deployment.
95+
96+
To ensure continued best practices in your own repository, we recommend that anyone creating solutions based on our templates ensure that the [Github secret scanning](https://docs.github.com/code-security/secret-scanning/about-secret-scanning) setting is enabled.
97+
98+
You may want to consider additional security measures, such as:
99+
100+
* Enabling Microsoft Defender for Cloud to [secure your Azure resources](https://learn.microsoft.com/azure/security-center/defender-for-cloud).
101+
* Protecting the Azure Container Apps instance with a [firewall](https://learn.microsoft.com/azure/container-apps/waf-app-gateway) and/or [Virtual Network](https://learn.microsoft.com/azure/container-apps/networking?tabs=workload-profiles-env%2Cazure-cli).
102+
103+
### How to customize
104+
105+
If you'd like to customize the solution accelerator, here are some common areas to start:
106+
- [Adding your own Schemas and Data](./docs/CustomizeSchemaData.md)
107+
- [Modifying System Processing Prompts](./docs/CustomizeSystemPrompts.md)
108+
- [Ingesting API for Event-Driven Processing](./docs/API.md)
109+
110+
### Additional resources
111+
112+
- [Technical Architecture](./docs/TechnicalArchitecture.md)
113+
- [Technical Approach & Processing Pipeline](./docs/ProcessingPipelineApproach.md)
114+
115+
## Responsible AI Transparency FAQ
116+
Please refer to [Transparency FAQ](./TRANSPARENCY_FAQ.md) for responsible AI transparency details of this solution accelerator.
117+
118+
119+
## Disclaimers
120+
121+
To the extent that the Software includes components or code used in or derived from Microsoft products or services, including without limitation Microsoft Azure Services (collectively, “Microsoft Products and Services”), you must also comply with the Product Terms applicable to such Microsoft Products and Services. You acknowledge and agree that the license governing the Software does not grant you a license or other right to use Microsoft Products and Services. Nothing in the license or this ReadMe file will serve to supersede, amend, terminate or modify any terms in the Product Terms for any Microsoft Products and Services.
122+
123+
You must also comply with all domestic and international export laws and regulations that apply to the Software, which include restrictions on destinations, end users, and end use. For further information on export restrictions, visit https://aka.ms/exporting.
124+
125+
You acknowledge that the Software and Microsoft Products and Services (1) are not designed, intended or made available as a medical device(s), and (2) are not designed or intended to be a substitute for professional medical advice, diagnosis, treatment, or judgment and should not be used to replace or as a substitute for professional medical advice, diagnosis, treatment, or judgment. Customer is solely responsible for displaying and/or obtaining appropriate consents, warnings, disclaimers, and acknowledgements to end users of Customer’s implementation of the Online Services.
126+
127+
You acknowledge the Software is not subject to SOC 1 and SOC 2 compliance audits. No Microsoft technology, nor any of its component technologies, including the Software, is intended or made available as a substitute for the professional advice, opinion, or judgement of a certified financial services professional. Do not use the Software to replace, substitute, or provide professional financial advice or judgment.
128+
129+
BY ACCESSING OR USING THE SOFTWARE, YOU ACKNOWLEDGE THAT THE SOFTWARE IS NOT DESIGNED OR INTENDED TO SUPPORT ANY USE IN WHICH A SERVICE INTERRUPTION, DEFECT, ERROR, OR OTHER FAILURE OF THE SOFTWARE COULD RESULT IN THE DEATH OR SERIOUS BODILY INJURY OF ANY PERSON OR IN PHYSICAL OR ENVIRONMENTAL DAMAGE (COLLECTIVELY, “HIGH-RISK USE”), AND THAT YOU WILL ENSURE THAT, IN THE EVENT OF ANY INTERRUPTION, DEFECT, ERROR, OR OTHER FAILURE OF THE SOFTWARE, THE SAFETY OF PEOPLE, PROPERTY, AND THE ENVIRONMENT ARE NOT REDUCED BELOW A LEVEL THAT IS REASONABLY, APPROPRIATE, AND LEGAL, WHETHER IN GENERAL OR IN A SPECIFIC INDUSTRY. BY ACCESSING THE SOFTWARE, YOU FURTHER ACKNOWLEDGE THAT YOUR HIGH-RISK USE OF THE SOFTWARE IS AT YOUR OWN RISK.

SECURITY.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
1-
<!-- BEGIN MICROSOFT SECURITY.MD V0.0.9 BLOCK -->
1+
<!-- BEGIN MICROSOFT SECURITY.MD V0.0.5 BLOCK -->
22

33
## Security
44

5-
Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet) and [Xamarin](https://github.com/xamarin).
5+
Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/).
66

7-
If you believe you have found a security vulnerability in any Microsoft-owned repository that meets [Microsoft's definition of a security vulnerability](https://aka.ms/security.md/definition), please report it to us as described below.
7+
If you believe you have found a security vulnerability in any Microsoft-owned repository that meets [Microsoft's definition of a security vulnerability](https://docs.microsoft.com/en-us/previous-versions/tn-archive/cc751383(v=technet.10)), please report it to us as described below.
88

99
## Reporting Security Issues
1010

1111
**Please do not report security vulnerabilities through public GitHub issues.**
1212

13-
Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://aka.ms/security.md/msrc/create-report).
13+
Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://msrc.microsoft.com/create-report).
1414

15-
If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://aka.ms/security.md/msrc/pgp).
15+
If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://www.microsoft.com/en-us/msrc/pgp-key-msrc).
1616

1717
You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://www.microsoft.com/msrc).
1818

@@ -28,14 +28,14 @@ Please include the requested information listed below (as much as you can provid
2828

2929
This information will help us triage your report more quickly.
3030

31-
If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://aka.ms/security.md/msrc/bounty) page for more details about our active programs.
31+
If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://microsoft.com/msrc/bounty) page for more details about our active programs.
3232

3333
## Preferred Languages
3434

3535
We prefer all communications to be in English.
3636

3737
## Policy
3838

39-
Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://aka.ms/security.md/cvd).
39+
Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://www.microsoft.com/en-us/msrc/cvd).
4040

41-
<!-- END MICROSOFT SECURITY.MD BLOCK -->
41+
<!-- END MICROSOFT SECURITY.MD BLOCK -->

SUPPORT.md

Lines changed: 2 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,3 @@
1-
# TODO: The maintainer of this repo has not yet edited this file
2-
3-
**REPO OWNER**: Do you want Customer Service & Support (CSS) support for this product/project?
4-
5-
- **No CSS support:** Fill out this template with information about how to file issues and get help.
6-
- **Yes CSS support:** Fill out an intake form at [aka.ms/onboardsupport](https://aka.ms/onboardsupport). CSS will work with/help you to determine next steps.
7-
- **Not sure?** Fill out an intake as though the answer were "Yes". CSS will help you decide.
8-
9-
*Then remove this first heading from this SUPPORT.MD file before publishing your repo.*
10-
111
# Support
122

133
## How to file issues and get help
@@ -16,10 +6,8 @@ This project uses GitHub Issues to track bugs and feature requests. Please searc
166
issues before filing new issues to avoid duplicates. For new issues, file your bug or
177
feature request as a new Issue.
188

19-
For help and questions about using this project, please **REPO MAINTAINER: INSERT INSTRUCTIONS HERE
20-
FOR HOW TO ENGAGE REPO OWNERS OR COMMUNITY FOR HELP. COULD BE A STACK OVERFLOW TAG OR OTHER
21-
CHANNEL. WHERE WILL YOU HELP PEOPLE?**.
9+
For help and questions about using this project, please submit an issue on this repository.
2210

2311
## Microsoft Support Policy
2412

25-
Support for this **PROJECT or PRODUCT** is limited to the resources listed above.
13+
Support for this repository is limited to the resources listed above.

0 commit comments

Comments
 (0)