Skip to content

Commit 39d5cb6

Browse files
committed
updating cross ref
1 parent dffe08a commit 39d5cb6

1 file changed

Lines changed: 139 additions & 59 deletions

File tree

README.md

Lines changed: 139 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -1,98 +1,167 @@
1-
# Content Processing Solution Accelerator
1+
# Content processing solution accelerator
2+
This solution accelerator enables customers to programmatically extract data and apply schemas to unstructured documents across text-based and multi-modal content. During processing, extraction and data schema transformation - these steps are scored for accuracy to automate processing and identify as-needed human validation. This allows for improved accuracy and greater speed for data integration into downstream systems.
23

3-
MENU: [**USER STORY**](#user-story) \| [**QUICK DEPLOY**](#quick-deploy) \| [**SUPPORTING DOCUMENTATION**](#supporting-documentation)
4-
5-
<h2><img src="./docs/Images/ReadMe/userStory.png" width="64">
64
<br/>
7-
User story
8-
</h2>
95

10-
### Overview
6+
<div align="center">
7+
8+
[**SOLUTION OVERVIEW**](#solution-overview) \| [**QUICK DEPLOY**](#quick-deploy) \| [**BUSINESS SCENARIO**](#business-scenario) \| [**SUPPORTING DOCUMENTATION**](#supporting-documentation)
119

12-
This solution accelerator enables customers to programmatically extract data and apply schemas to unstructured documents across text-based and multi-modal content. During processing, extraction and data schema transformation - these steps are scored for accuracy to automate processing and identify as-needed human validation. This allows for improved accuracy and greater speed for data integration into downstream systems.
10+
</div>
11+
<br/>
1312

14-
It leverages Azure AI Foundry, Azure AI Content Understanding, Azure OpenAI Service, Azure blob storage, and Azure Cosmos DB to transform large volumes of unstructured content through event-driven processing pipelines for integration into downstream applications and post-processing activities.
13+
<h2><img src="./docs/images/readme/solution-overview.png" width="48" />
14+
Solution overview
15+
</h2>
1516

17+
The solution leverages Azure AI Foundry, Azure AI Content Understanding, Azure OpenAI Service, Azure blob storage, and Azure Cosmos DB to transform large volumes of unstructured content through event-driven processing pipelines for integration into downstream applications and post-processing activities.
1618

17-
### Technical key features
19+
### Solution architecture
20+
|![image](./docs/images/readme/solution-architecture.png)|
21+
|---|
1822

19-
- **Multi-modal content processing:** Utilizes machine learning-based OCR for efficient text extraction and integrates GPT Vision for processing various content formats.​
2023

21-
- **Schema-based data transformation:** Maps extracted content to custom or industry-defined schemas and outputs as JSON for interoperability.​
24+
### How to customize
25+
If you'd like to customize the solution accelerator, here are some common areas to start:
2226

23-
- **Confidence scoring:** Calculation of entity extraction and schema mapping processes for accuracy, providing scores to drive manual human-in-the-loop review, if desired.
27+
[Adding your own Schemas and Data](./docs/CustomizeSchemaData.md)
2428

25-
- **Review, validate, update:** Transparency in reviewing processing steps and final output - allowing for review, comparison to source asset, ability to modify output results, and annotation for historical reference.
29+
[Modifying System Processing Prompts](./docs/CustomizeSystemPrompts.md)
2630

27-
- **API driven processing pipelines:** API end-points are available for external source systems to integrate event-driven processing workflows.
31+
[Ingesting API for Event-Driven Processing](./docs/API.md)
2832

2933
<br/>
3034

31-
Below is an image of the solution accelerator:
35+
### Additional resources
3236

33-
![image](./docs/Images/ReadMe/ui.png)
37+
[Technical Architecture](./docs/TechnicalArchitecture.md)
3438

35-
### Use case / scenario
39+
[Technical Approach & Processing Pipeline](./docs/ProcessingPipelineApproach.md)
3640

37-
A data analyst at a property insurance company manages and ensures claims for data accuracy and compliance.
41+
<br/>
3842

39-
A recent natural disaster has led to an influx of insurance claims coming into the pipeline. The analyst is tasked with accurately validating ingested data from claims and invoices being processed through the system. Claims data includes various multi-modal content types, with details extracted and mapped to defined schemas such as policy plans, invoices, and insurance adjuster reports.
43+
### Key features
44+
<details open>
45+
  <summary>Click to learn more about the key features this solution enables</summary>
4046

41-
AI is used to extract, transform, and flag potential discrepancies, such as missing policyholder details and outlier repair estimates. The data analyst then cross-checks the findings against historical claims data and regulatory guidelines. Collaborating with the compliance team, she verifies the flagged issues and refines the dataset.
47+
- **Multi-modal content processing** <br/>
48+
Utilizes machine learning-based OCR for efficient text extraction and integrates GPT Vision for processing various content formats.​
4249

43-
Thanks to AI pipeline processing, data moves much faster, more accurately, and is more seamlessly integrated into the data analyst's workflow.
50+
- **Schema-based data transformation** <br/>
51+
Maps extracted content to custom or industry-defined schemas and outputs as JSON for interoperability
4452

45-
The sample data used in this repository is synthetic and generated using Azure OpenAI service. The data is intended for use as sample data only.
53+
- **Confidence scoring** <br/>
54+
Calculation of entity extraction and schema mapping processes for accuracy, providing scores to drive manual human-in-the-loop review, if desired
4655

47-
### Solution architecture
48-
![image](./docs/Images/ReadMe/solution-architecture.png)
56+
- **Review, validate, update** <br/>
57+
Transparency in reviewing processing steps and final output - allowing for review, comparison to source asset, ability to modify output results, and annotation for historical reference
4958

59+
- **API driven processing pipelinese** <br/>
60+
API end-points are available for external source systems to integrate event-driven processing workflows
61+
62+
</details>
5063

51-
<h2><img src="./docs/Images/ReadMe/quickDeploy.png" width="64">
52-
<br/>
53-
QUICK DEPLOY
64+
<br /><br />
65+
<h2><img src="./docs/images/readme/quick-deploy.png" width="48" />
66+
Quick deploy
5467
</h2>
5568

56-
Follow the quick deploy steps on the deployment guide to deploy this solution to your own Azure subscription.
69+
### How to install or deploy
70+
Follow the quick deploy steps on the deployment guide to deploy this solution to your own Azure subscription.
5771

5872
[Click here to launch the deployment guide](./docs/DeploymentGuide.md)
59-
73+
<br/><br/>
6074

6175
| [![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/microsoft/content-processing-solution-accelerator) | [![Open in Dev Containers](https://img.shields.io/static/v1?style=for-the-badge&label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/microsoft/content-processing-solution-accelerator) |
6276
|---|---|
6377

64-
<br>
78+
<br/>
79+
80+
> ⚠️ **Important: Check Azure OpenAI Quota Availability**
81+
<br/>To ensure sufficient quota is available in your subscription, please follow [quota check instructions guide](./docs/QuotaCheck.md) before you deploy the solution.
6582
66-
> ⚠️ **Important: Check Azure OpenAI Quota Availability** <br/>To ensure sufficient quota is available in your subscription, please follow [quota check instructions guide](./docs/quota_check.md) before you deploy the solution.
83+
<br/>
6784

68-
<br/>
85+
### Prerequisites and Costs
86+
To deploy this solution accelerator, ensure you have access to an [Azure subscription](https://azure.microsoft.com/free/) with the necessary permissions to create **resource groups, resources, app registrations, and assign roles at the resource group level**. This should include Contributor role at the subscription level and Role Based Access Control role on the subscription and/or resource group level. Follow the steps in [Azure Account Set Up](./docs/AzureAccountSetUp.md).
6987

70-
<h2>
71-
Supporting Documentation
72-
</h2>
88+
Here are some example regions where the services are available: East US, East US2, Australia East, UK South, France Central, Africa.
7389

74-
### Costs
90+
Check the [Azure Products by Region](https://azure.microsoft.com/en-us/explore/global-infrastructure/products-by-region/?products=all&regions=all) page and select a **region** where the following services are available.
7591

76-
Pricing varies per region and usage, so it isn't possible to predict exact costs for your usage.
77-
The majority of the Azure resources used in this infrastructure are on usage-based pricing tiers.
78-
However, Azure Container Registry has a fixed cost per registry per day.
92+
Pricing varies per region and usage, so it isn't possible to predict exact costs for your usage. The majority of the Azure resources used in this infrastructure are on usage-based pricing tiers. However, Azure Container Registry has a fixed cost per registry per day.
7993

80-
You can try the [Azure pricing calculator](https://azure.microsoft.com/en-us/pricing/calculator) for the resources:
94+
Use the [Azure pricing calculator](https://azure.microsoft.com/en-us/pricing/calculator) to calculate the cost of this solution in your subscription. [Review a sample pricing sheet for the achitecture](https://azure.com/e/68b51f4cb79a4466b631a11aa57e9c16).
8195

82-
* Azure AI Foundry: Free tier. [Pricing](https://azure.microsoft.com/pricing/details/ai-studio/)
83-
* Azure Storage Account for AI Foundry: Standard tier, LRS. Pricing is based on storage and operations. [Pricing](https://azure.microsoft.com/pricing/details/storage/blobs/)
84-
* Azure Key Vault: Standard tier. Pricing is based on the number of operations. [Pricing](https://azure.microsoft.com/pricing/details/key-vault/)
85-
* Azure Storage Account for Content Processing Application: Standard tier, LRS. Pricing is based on storage and operations. [Pricing](https://azure.microsoft.com/pricing/details/storage/blobs/)
86-
* Azure AI Services: S0 tier, defaults to gpt-4o-mini. Pricing is based on token count. [Pricing](https://azure.microsoft.com/pricing/details/cognitive-services/)
87-
* Azure Container App: Consumption tier with 4 CPU, 8GiB memory/storage. Pricing is based on resource allocation, and each month allows for a certain amount of free usage. [Pricing](https://azure.microsoft.com/pricing/details/container-apps/)
88-
* Azure Container Registry: Basic tier. [Pricing](https://azure.microsoft.com/pricing/details/container-registry/)
89-
* Log analytics: Pay-as-you-go tier. Costs based on data ingested. [Pricing](https://azure.microsoft.com/pricing/details/monitor/)
90-
* Azure Cosmos DB: [Pricing](https://azure.microsoft.com/en-us/pricing/details/cosmos-db/autoscale-provisioned/)
96+
97+
<br/>
9198

9299

93-
> ⚠️ To avoid unnecessary costs, remember to take down your app if it's no longer in use,
100+
| Product | Description | Cost |
101+
|---|---|---|
102+
| [Azure AI Foundry](https://learn.microsoft.com/en-us/azure/ai-foundry/) | Build generative AI applications on an enterprise-grade platform | [Pricing](https://azure.microsoft.com/pricing/details/ai-studio/) |
103+
| [Azure OpenAI Service](https://learn.microsoft.com/en-us/azure/ai-services/openai/) | Provides REST API access to OpenAI's powerful language models including o3-mini, o1, o1-mini, GPT-4o, GPT-4o mini | [Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/) |
104+
| [Azure AI Content Understanding Service](https://learn.microsoft.com/en-us/azure/ai-services/content-understanding/) | Analyzes various media content—such as audio, video, text, and images—transforming it into structured, searchable data | [Pricing](https://azure.microsoft.com/en-us/pricing/details/content-understanding/) |
105+
| [Azure Blob Storage](https://learn.microsoft.com/en-us/azure/storage/blobs/) | Microsoft's object storage solution for the cloud. Blob storage is optimized for storing massive amounts of unstructured data | [Pricing](https://azure.microsoft.com/pricing/details/storage/blobs/) |
106+
| [Azure Container Apps](https://learn.microsoft.com/en-us/azure/container-apps/) | Allows you to run containerized applications without worrying about orchestration or infrastructure. | [Pricing](https://azure.microsoft.com/pricing/details/container-apps/) |
107+
| [Azure Container Registry](https://learn.microsoft.com/en-us/azure/container-registry/) | Build, store, and manage container images and artifacts in a private registry for all types of container deployments | [Pricing](https://azure.microsoft.com/pricing/details/container-registry/) |
108+
| [Azure Cosmos DB](https://learn.microsoft.com/en-us/azure/cosmos-db/) | Fully managed, distributed NoSQL, relational, and vector database for modern app development | [Pricing](https://azure.microsoft.com/en-us/pricing/details/cosmos-db/autoscale-provisioned/) |
109+
| [Azure Queue Storage](https://learn.microsoft.com/en-us/azure/storage/queues/) | Store large numbers of messages and access messages from anywhere in the world via HTTP or HTTPS. | [Pricing]() |
110+
| [GPT Model Capacity](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models) | The latest most capable Azure OpenAI models with multimodal versions, accepting both text and images as input | [Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/) |
111+
112+
<br/>
113+
114+
>⚠️ **Important:** To avoid unnecessary costs, remember to take down your app if it's no longer in use,
94115
either by deleting the resource group in the Portal or running `azd down`.
95116

117+
<br /><br />
118+
<h2><img src="./docs/images/readme/business-scenario.png" width="48" />
119+
Business Scenario
120+
</h2>
121+
122+
|![image](./docs/images/readme/ui.png)|
123+
|---|
124+
125+
<br/>
126+
127+
A data analyst at a property insurance company manages and ensures claims for data accuracy and compliance.
128+
129+
A recent natural disaster has led to an influx of insurance claims coming into the pipeline. The analyst is tasked with accurately validating ingested data from claims and invoices being processed through the system. Claims data includes various multi-modal content types, with details extracted and mapped to defined schemas such as policy plans, invoices, and insurance adjuster reports.
130+
131+
AI is used to extract, transform, and flag potential discrepancies, such as missing policyholder details and outlier repair estimates. The data analyst then cross-checks the findings against historical claims data and regulatory guidelines. Collaborating with the compliance team, she verifies the flagged issues and refines the dataset.
132+
133+
Thanks to AI pipeline processing, data moves much faster, more accurately, and is more seamlessly integrated into the data analyst's workflow.
134+
135+
⚠️ The sample data used in this repository is synthetic and generated using Azure OpenAI service. The data is intended for use as sample data only.
136+
137+
</details>
138+
139+
<br/>
140+
141+
### Business value
142+
<details>
143+
  <summary>Click to learn more about what value this solution provides</summary>
144+
145+
- **Automated data management** <br/>
146+
Streamline data management to enable event-driven automation. While standardizing the data structure for a reusable experience, improving productivity at scale.
147+
148+
- **Enhanced data processing** <br/>
149+
Efficiently extract key details, keywords, and entities, to automatically map them to the specified schemas, optimizing workflows, reducing manual effort and saving time.
150+
151+
- **Data confidence** <br/>
152+
Systematic extraction and mapping elevate confidence in AI workflows by applying tolerance thresholds and ensuring quality results through scoring, all while enhancing accuracy.
153+
154+
- **Verifiable Approvals** <br/>
155+
Human verification of processed content ensures reliability and precision of the final output when thresholds are not met, while fostering trust and guaranteeing consistency.
156+
157+
</details>
158+
159+
<br /><br />
160+
161+
<h2><img src="./docs/images/readme/supporting-documentation.png" width="48" />
162+
Supporting documentation
163+
</h2>
164+
96165
### Security guidelines
97166

98167
This template uses Azure Key Vault to store all connections to communicate between resources.
@@ -106,21 +175,32 @@ You may want to consider additional security measures, such as:
106175
* Enabling Microsoft Defender for Cloud to [secure your Azure resources](https://learn.microsoft.com/azure/security-center/defender-for-cloud).
107176
* Protecting the Azure Container Apps instance with a [firewall](https://learn.microsoft.com/azure/container-apps/waf-app-gateway) and/or [Virtual Network](https://learn.microsoft.com/azure/container-apps/networking?tabs=workload-profiles-env%2Cazure-cli).
108177

109-
### How to customize
178+
<br/>
179+
110180

111-
If you'd like to customize the solution accelerator, here are some common areas to start:
112-
- [Adding your own Schemas and Data](./docs/CustomizeSchemaData.md)
113-
- [Modifying System Processing Prompts](./docs/CustomizeSystemPrompts.md)
114-
- [Ingesting API for Event-Driven Processing](./docs/API.md)
181+
### Cross references
182+
Check out similar solution accelerators
183+
115184

116-
### Additional resources
185+
| Solution Accelerator | Description |
186+
|---|---|
187+
| [Document&nbsp;knowledge&nbsp;mining](https://github.com/microsoft/Document-Knowledge-Mining-Solution-Accelerator) | Process and extract summaries, entities, and metadata from unstructured, multi-modal documents and enable searching and chatting over this data. |
188+
| [Conversation&nbsp;knowledge&nbsp;mining](https://github.com/microsoft/Conversation-Knowledge-Mining-Solution-Accelerator) | Derive insights from volumes of conversational data using generative AI. It offers key phrase extraction, topic modeling, and interactive chat experiences through an intuitive web interface. |
189+
| [Document&nbsp;generation](https://github.com/microsoft/document-generation-solution-accelerator) | Identify relevant documents, summarize unstructured information, and generate document templates. |
117190

118-
- [Technical Architecture](./docs/TechnicalArchitecture.md)
119-
- [Technical Approach & Processing Pipeline](./docs/ProcessingPipelineApproach.md)
191+
192+
<br/>
193+
194+
195+
## Provide feedback
196+
Have questions, find a bug, or want to request a feature? [Submit a new issue](https://github.com/microsoft/content-processing-solution-accelerator/issues) on this repo and we'll connect.
197+
198+
<br/>
120199

121200
## Responsible AI Transparency FAQ
122201
Please refer to [Transparency FAQ](./TRANSPARENCY_FAQ.md) for responsible AI transparency details of this solution accelerator.
123202

203+
<br/>
124204

125205
## Disclaimers
126206

0 commit comments

Comments
 (0)