You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This solution accelerator enables customers to programmatically extract data and apply schemas to unstructured documents across text-based and multi-modal content. During processing, extraction and data schema transformation - these steps are scored for accuracy to automate processing and identify as-needed human validation. This allows for improved accuracy and greater speed for data integration into downstream systems.
This solution accelerator enables customers to programmatically extract data and apply schemas to unstructured documents across text-based and multi-modal content. During processing, extraction and data schema transformation - these steps are scored for accuracy to automate processing and identify as-needed human validation. This allows for improved accuracy and greater speed for data integration into downstream systems.
10
+
</div>
11
+
<br/>
13
12
14
-
It leverages Azure AI Foundry, Azure AI Content Understanding, Azure OpenAI Service, Azure blob storage, and Azure Cosmos DB to transform large volumes of unstructured content through event-driven processing pipelines for integration into downstream applications and post-processing activities.
The solution leverages Azure AI Foundry, Azure AI Content Understanding, Azure OpenAI Service, Azure blob storage, and Azure Cosmos DB to transform large volumes of unstructured content through event-driven processing pipelines for integration into downstream applications and post-processing activities.
-**Multi-modal content processing:** Utilizes machine learning-based OCR for efficient text extraction and integrates GPT Vision for processing various content formats.
20
23
21
-
-**Schema-based data transformation:** Maps extracted content to custom or industry-defined schemas and outputs as JSON for interoperability.
24
+
### How to customize
25
+
If you'd like to customize the solution accelerator, here are some common areas to start:
22
26
23
-
-**Confidence scoring:** Calculation of entity extraction and schema mapping processes for accuracy, providing scores to drive manual human-in-the-loop review, if desired.
27
+
[Adding your own Schemas and Data](./docs/CustomizeSchemaData.md)
24
28
25
-
-**Review, validate, update:** Transparency in reviewing processing steps and final output - allowing for review, comparison to source asset, ability to modify output results, and annotation for historical reference.
29
+
[Modifying System Processing Prompts](./docs/CustomizeSystemPrompts.md)
26
30
27
-
-**API driven processing pipelines:** API end-points are available for external source systems to integrate event-driven processing workflows.
31
+
[Ingesting API for Event-Driven Processing](./docs/API.md)
A data analyst at a property insurance company manages and ensures claims for data accuracy and compliance.
41
+
<br/>
38
42
39
-
A recent natural disaster has led to an influx of insurance claims coming into the pipeline. The analyst is tasked with accurately validating ingested data from claims and invoices being processed through the system. Claims data includes various multi-modal content types, with details extracted and mapped to defined schemas such as policy plans, invoices, and insurance adjuster reports.
43
+
### Key features
44
+
<detailsopen>
45
+
<summary>Click to learn more about the key features this solution enables</summary>
40
46
41
-
AI is used to extract, transform, and flag potential discrepancies, such as missing policyholder details and outlier repair estimates. The data analyst then cross-checks the findings against historical claims data and regulatory guidelines. Collaborating with the compliance team, she verifies the flagged issues and refines the dataset.
47
+
-**Multi-modal content processing** <br/>
48
+
Utilizes machine learning-based OCR for efficient text extraction and integrates GPT Vision for processing various content formats.
42
49
43
-
Thanks to AI pipeline processing, data moves much faster, more accurately, and is more seamlessly integrated into the data analyst's workflow.
50
+
-**Schema-based data transformation** <br/>
51
+
Maps extracted content to custom or industry-defined schemas and outputs as JSON for interoperability
44
52
45
-
The sample data used in this repository is synthetic and generated using Azure OpenAI service. The data is intended for use as sample data only.
53
+
-**Confidence scoring** <br/>
54
+
Calculation of entity extraction and schema mapping processes for accuracy, providing scores to drive manual human-in-the-loop review, if desired
Transparency in reviewing processing steps and final output - allowing for review, comparison to source asset, ability to modify output results, and annotation for historical reference
49
58
59
+
-**API driven processing pipelinese** <br/>
60
+
API end-points are available for external source systems to integrate event-driven processing workflows
Follow the quick deploy steps on the deployment guide to deploy this solution to your own Azure subscription.
69
+
### How to install or deploy
70
+
Follow the quick deploy steps on the deployment guide to deploy this solution to your own Azure subscription.
57
71
58
72
[Click here to launch the deployment guide](./docs/DeploymentGuide.md)
59
-
73
+
<br/><br/>
60
74
61
75
|[](https://codespaces.new/microsoft/content-processing-solution-accelerator)|[](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/microsoft/content-processing-solution-accelerator)|
<br/>To ensure sufficient quota is available in your subscription, please follow [quota check instructions guide](./docs/QuotaCheck.md) before you deploy the solution.
65
82
66
-
> ⚠️ **Important: Check Azure OpenAI Quota Availability**<br/>To ensure sufficient quota is available in your subscription, please follow [quota check instructions guide](./docs/quota_check.md) before you deploy the solution.
83
+
<br/>
67
84
68
-
<br/>
85
+
### Prerequisites and Costs
86
+
To deploy this solution accelerator, ensure you have access to an [Azure subscription](https://azure.microsoft.com/free/) with the necessary permissions to create **resource groups, resources, app registrations, and assign roles at the resource group level**. This should include Contributor role at the subscription level and Role Based Access Control role on the subscription and/or resource group level. Follow the steps in [Azure Account Set Up](./docs/AzureAccountSetUp.md).
69
87
70
-
<h2>
71
-
Supporting Documentation
72
-
</h2>
88
+
Here are some example regions where the services are available: East US, East US2, Australia East, UK South, France Central, Africa.
73
89
74
-
### Costs
90
+
Check the [Azure Products by Region](https://azure.microsoft.com/en-us/explore/global-infrastructure/products-by-region/?products=all®ions=all) page and select a **region** where the following services are available.
75
91
76
-
Pricing varies per region and usage, so it isn't possible to predict exact costs for your usage.
77
-
The majority of the Azure resources used in this infrastructure are on usage-based pricing tiers.
78
-
However, Azure Container Registry has a fixed cost per registry per day.
92
+
Pricing varies per region and usage, so it isn't possible to predict exact costs for your usage. The majority of the Azure resources used in this infrastructure are on usage-based pricing tiers. However, Azure Container Registry has a fixed cost per registry per day.
79
93
80
-
You can try the [Azure pricing calculator](https://azure.microsoft.com/en-us/pricing/calculator) for the resources:
94
+
Use the [Azure pricing calculator](https://azure.microsoft.com/en-us/pricing/calculator)to calculate the cost of this solution in your subscription. [Review a sample pricing sheet for the achitecture](https://azure.com/e/68b51f4cb79a4466b631a11aa57e9c16).
81
95
82
-
* Azure AI Foundry: Free tier. [Pricing](https://azure.microsoft.com/pricing/details/ai-studio/)
83
-
* Azure Storage Account for AI Foundry: Standard tier, LRS. Pricing is based on storage and operations. [Pricing](https://azure.microsoft.com/pricing/details/storage/blobs/)
84
-
* Azure Key Vault: Standard tier. Pricing is based on the number of operations. [Pricing](https://azure.microsoft.com/pricing/details/key-vault/)
85
-
* Azure Storage Account for Content Processing Application: Standard tier, LRS. Pricing is based on storage and operations. [Pricing](https://azure.microsoft.com/pricing/details/storage/blobs/)
86
-
* Azure AI Services: S0 tier, defaults to gpt-4o-mini. Pricing is based on token count. [Pricing](https://azure.microsoft.com/pricing/details/cognitive-services/)
87
-
* Azure Container App: Consumption tier with 4 CPU, 8GiB memory/storage. Pricing is based on resource allocation, and each month allows for a certain amount of free usage. [Pricing](https://azure.microsoft.com/pricing/details/container-apps/)
> ⚠️ To avoid unnecessary costs, remember to take down your app if it's no longer in use,
100
+
| Product | Description | Cost |
101
+
|---|---|---|
102
+
|[Azure AI Foundry](https://learn.microsoft.com/en-us/azure/ai-foundry/)| Build generative AI applications on an enterprise-grade platform |[Pricing](https://azure.microsoft.com/pricing/details/ai-studio/)|
103
+
|[Azure OpenAI Service](https://learn.microsoft.com/en-us/azure/ai-services/openai/)| Provides REST API access to OpenAI's powerful language models including o3-mini, o1, o1-mini, GPT-4o, GPT-4o mini |[Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/)|
104
+
|[Azure AI Content Understanding Service](https://learn.microsoft.com/en-us/azure/ai-services/content-understanding/)| Analyzes various media content—such as audio, video, text, and images—transforming it into structured, searchable data |[Pricing](https://azure.microsoft.com/en-us/pricing/details/content-understanding/)|
105
+
|[Azure Blob Storage](https://learn.microsoft.com/en-us/azure/storage/blobs/)| Microsoft's object storage solution for the cloud. Blob storage is optimized for storing massive amounts of unstructured data |[Pricing](https://azure.microsoft.com/pricing/details/storage/blobs/)|
106
+
|[Azure Container Apps](https://learn.microsoft.com/en-us/azure/container-apps/)| Allows you to run containerized applications without worrying about orchestration or infrastructure. |[Pricing](https://azure.microsoft.com/pricing/details/container-apps/)|
107
+
|[Azure Container Registry](https://learn.microsoft.com/en-us/azure/container-registry/)| Build, store, and manage container images and artifacts in a private registry for all types of container deployments |[Pricing](https://azure.microsoft.com/pricing/details/container-registry/)|
108
+
|[Azure Cosmos DB](https://learn.microsoft.com/en-us/azure/cosmos-db/)| Fully managed, distributed NoSQL, relational, and vector database for modern app development |[Pricing](https://azure.microsoft.com/en-us/pricing/details/cosmos-db/autoscale-provisioned/)|
109
+
|[Azure Queue Storage](https://learn.microsoft.com/en-us/azure/storage/queues/)| Store large numbers of messages and access messages from anywhere in the world via HTTP or HTTPS. |[Pricing]()|
110
+
|[GPT Model Capacity](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models)| The latest most capable Azure OpenAI models with multimodal versions, accepting both text and images as input |[Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/)|
111
+
112
+
<br/>
113
+
114
+
>⚠️ **Important:** To avoid unnecessary costs, remember to take down your app if it's no longer in use,
94
115
either by deleting the resource group in the Portal or running `azd down`.
A data analyst at a property insurance company manages and ensures claims for data accuracy and compliance.
128
+
129
+
A recent natural disaster has led to an influx of insurance claims coming into the pipeline. The analyst is tasked with accurately validating ingested data from claims and invoices being processed through the system. Claims data includes various multi-modal content types, with details extracted and mapped to defined schemas such as policy plans, invoices, and insurance adjuster reports.
130
+
131
+
AI is used to extract, transform, and flag potential discrepancies, such as missing policyholder details and outlier repair estimates. The data analyst then cross-checks the findings against historical claims data and regulatory guidelines. Collaborating with the compliance team, she verifies the flagged issues and refines the dataset.
132
+
133
+
Thanks to AI pipeline processing, data moves much faster, more accurately, and is more seamlessly integrated into the data analyst's workflow.
134
+
135
+
⚠️ The sample data used in this repository is synthetic and generated using Azure OpenAI service. The data is intended for use as sample data only.
136
+
137
+
</details>
138
+
139
+
<br/>
140
+
141
+
### Business value
142
+
<details>
143
+
<summary>Click to learn more about what value this solution provides</summary>
144
+
145
+
-**Automated data management** <br/>
146
+
Streamline data management to enable event-driven automation. While standardizing the data structure for a reusable experience, improving productivity at scale.
147
+
148
+
-**Enhanced data processing** <br/>
149
+
Efficiently extract key details, keywords, and entities, to automatically map them to the specified schemas, optimizing workflows, reducing manual effort and saving time.
150
+
151
+
-**Data confidence** <br/>
152
+
Systematic extraction and mapping elevate confidence in AI workflows by applying tolerance thresholds and ensuring quality results through scoring, all while enhancing accuracy.
153
+
154
+
-**Verifiable Approvals** <br/>
155
+
Human verification of processed content ensures reliability and precision of the final output when thresholds are not met, while fostering trust and guaranteeing consistency.
This template uses Azure Key Vault to store all connections to communicate between resources.
@@ -106,21 +175,32 @@ You may want to consider additional security measures, such as:
106
175
* Enabling Microsoft Defender for Cloud to [secure your Azure resources](https://learn.microsoft.com/azure/security-center/defender-for-cloud).
107
176
* Protecting the Azure Container Apps instance with a [firewall](https://learn.microsoft.com/azure/container-apps/waf-app-gateway) and/or [Virtual Network](https://learn.microsoft.com/azure/container-apps/networking?tabs=workload-profiles-env%2Cazure-cli).
108
177
109
-
### How to customize
178
+
<br/>
179
+
110
180
111
-
If you'd like to customize the solution accelerator, here are some common areas to start:
112
-
-[Adding your own Schemas and Data](./docs/CustomizeSchemaData.md)
113
-
-[Modifying System Processing Prompts](./docs/CustomizeSystemPrompts.md)
114
-
-[Ingesting API for Event-Driven Processing](./docs/API.md)
181
+
### Cross references
182
+
Check out similar solution accelerators
183
+
115
184
116
-
### Additional resources
185
+
| Solution Accelerator | Description |
186
+
|---|---|
187
+
|[Document knowledge mining](https://github.com/microsoft/Document-Knowledge-Mining-Solution-Accelerator)| Process and extract summaries, entities, and metadata from unstructured, multi-modal documents and enable searching and chatting over this data. |
188
+
|[Conversation knowledge mining](https://github.com/microsoft/Conversation-Knowledge-Mining-Solution-Accelerator)| Derive insights from volumes of conversational data using generative AI. It offers key phrase extraction, topic modeling, and interactive chat experiences through an intuitive web interface. |
Have questions, find a bug, or want to request a feature? [Submit a new issue](https://github.com/microsoft/content-processing-solution-accelerator/issues) on this repo and we'll connect.
197
+
198
+
<br/>
120
199
121
200
## Responsible AI Transparency FAQ
122
201
Please refer to [Transparency FAQ](./TRANSPARENCY_FAQ.md) for responsible AI transparency details of this solution accelerator.
0 commit comments