|
2 | 2 |
|
3 | 3 | - ### What is the Content Processing Solution Accelerator? |
4 | 4 |
|
5 | | - This solution accelerator is an open-source GitHub Repository to extract data from unstructured documents and transform the data into defined schemas with validation to enhance the speed of downstream data ingestion and improve quality. It enables the ability to efficiently automate extraction, validation, and structuring of information for event driven system-to-system workflows. The solution is built using Azure OpenAI, Azure AI Services, Content Understanding Services, CosmosDB, and Azure Containers. |
| 5 | + This solution accelerator is an open-source GitHub Repository to extract data from unstructured documents and transform the data into defined schemas with validation to enhance the speed of downstream data ingestion and improve quality. It enables the ability to efficiently automate extraction, validation, and structuring of information for event driven system-to-system workflows. The solution is built using Azure OpenAI Service, Azure AI Services, Azure AI Content Understanding Service, Azure Cosmos DB, and Azure Container Apps. |
6 | 6 |
|
7 | 7 |
|
8 | 8 |
|
9 | 9 | - ### What can the Content Processing Solution Accelerator do? |
10 | 10 |
|
11 | | - The sample solution is tailored for a Data Analyst at a property insurance company, who analyzes large amounts of claim-related data including forms, reports, invoices, and property loss documentation. The sample data is synthetically generated utilizing Azure OpenAI and saved into related templates and files, which are unstructured documents that can be used to show the processing pipeline. Any names and other personally identifiable information in the sample data is fictitious. |
| 11 | + The sample solution is tailored for a Data Analyst at a property insurance company, who analyzes large amounts of claim-related data including forms, reports, invoices, and property loss documentation. The sample data is synthetically generated utilizing Azure OpenAI Service and saved into related templates and files, which are unstructured documents that can be used to show the processing pipeline. Any names and other personally identifiable information in the sample data is fictitious. |
12 | 12 |
|
13 | | - The sample solution processes the uploaded documents by exposing an API endpoint that utilizes Azure OpenAI and Content Understanding Service for extraction. The extracted data is then transformed into a specific schema output based on the content type (ex: invoice), and validates the extraction and schema mapping through accuracy scoring. The scoring enables thresholds to dictate a human-in-the-loop review of the output if needed, allowing a user to review, update, and add comments. |
| 13 | + The sample solution processes the uploaded documents by exposing an API endpoint that utilizes Azure OpenAI Service and Azure AI Content Understanding Service for extraction. The extracted data is then transformed into a specific schema output based on the content type (ex: invoice), and validates the extraction and schema mapping through accuracy scoring. The scoring enables thresholds to dictate a human-in-the-loop review of the output if needed, allowing a user to review, update, and add comments. |
14 | 14 |
|
15 | 15 | - ### What is/are the Content Processing Solution Accelerator’s intended use(s)? |
16 | 16 |
|
|
23 | 23 |
|
24 | 24 | - ### What are the limitations of the Content Processing Solution Accelerator? How can users minimize the Content Processing Solution Accelerator’s limitations when using the system? |
25 | 25 |
|
26 | | - This solution accelerator can only be used as a sample to accelerate the creation of content processing solutions. The repository showcases a sample scenario of a Data Analyst at a property insurance company, analyzing large amounts of claim-related data, but a human must still be responsible to validate the accuracy and correctness of data extracted for their documents, schema definitions related to business specific documents to be extracted, quality and validation scoring logic and thresholds for human-in-the-loop review, ingesting transformed data into subsequent systems, and their relevancy for using with customers. Users of the accelerator should review the system prompts provided and update as per their organizational guidance. AI generated content in the solution may be inaccurate and should be manually reviewed by the user. Currently, the sample repository is available in English only and is only tested to support PDF, PNG, and JPEG files. |
| 26 | + This solution accelerator can only be used as a sample to accelerate the creation of content processing solutions. The repository showcases a sample scenario of a Data Analyst at a property insurance company, analyzing large amounts of claim-related data, but a human must still be responsible to validate the accuracy and correctness of data extracted for their documents, schema definitions related to business specific documents to be extracted, quality and validation scoring logic and thresholds for human-in-the-loop review, ingesting transformed data into subsequent systems, and their relevancy for using with customers. Users of the accelerator should review the system prompts provided and update as per their organizational guidance. |
| 27 | + |
| 28 | + AI generated content in the solution may be inaccurate and the outputs and integrated solutions derived from the output data are not robustly trustworthy and should be manually reviewed by the user. You can find more information on AI generated content accuracy at https://aka.ms/overreliance-framework. |
| 29 | + |
| 30 | + Currently, the sample repository is available in English only and is only tested to support PDF, PNG, and JPEG files up to 20MB in size. |
27 | 31 |
|
28 | 32 | - ### What operational factors and settings allow for effective and responsible use of the Content Processing Solution Accelerator? |
29 | 33 |
|
|
0 commit comments