Commit 5bd3629
.Net: Add support for audio and binary tags to chat prompt parser (#11919)
### Motivation and Context
#### Why is this change required?
This template parsers like the YAML parser to embed content types other
than just text and images for LLMs that support additional content
types, like PDFs for OpenAI and DOCXs for Claude. Without this
capability, functions with prompts that have attachments would have to
manually build it's chat history in code.
#### What problem does it solve?
See above
#### What scenario does it contribute to?
Usage additional content types beyond visuals and audio for user
messages
#### Open Issues Addressed
- Fixes #11044
### Description
#### Chat Prompt Parser
To preserve backward compatibility, rather than consolidating binary
content types, I chose to go with adding additional content types so
that LLM chat service providers could opt-in to new content types. It
also reduces the chances of breaking existing code.
3 new content types are created:
* `PdfContent` for PDF files. Uses the tag "<pdf>". Allows for
Base64 data URIs or standard URIs, similar to `ImageContent`.
* `DocContent` for MS Word .doc files. Uses the tag "<doc>".
Allows for Base64 data URIs or standard URIs, similar to `ImageContent`.
* `DocxContent` for MS Word .docx files. Uses the tag "<docx>".
Allows for Base64 data URIs or standard URIs, similar to `ImageContent`.
(**NOTE**: `DocContent` and `DocxContent` are mainly separate because
they have different MIME types and different content formats, though
they could easily be consolidated into a single tag and just let the LLM
provider handle distinguishing between "doc" and "docx" files.
Alternately, I could also see the case for dropping ".doc" support and
requiring the caller to only use ".docx".)
In addition, the following 2 contents are now parsed from the XML:
* `AudioContent` - Parses the tag "<audio>" with either Base64
data URIs or standard URIs, similar to `ImageContent`.
* `BinaryContent` - Parses the tag "<file>" with either Base64
data URIs or standard URIs, similar to `ImageContent`.
Here is a sample:
```xml
<message role='user'>
This part will be discarded upon parsing
<text>Make sense of this random assortment of stuff.</text>
<image>https://fake-link-to-image/</image>
<audio>data:audio/wav;base64,UklGRiQAAABXQVZFZm10IBAAAAABAAEAIlYAAACABAAZGF0YVgAAAAA</audio>
<pdf>data:application/pdf;base64,JVBERi0xLjQKJeLjz9MKMyAwIG9iago8PC9UeXBlL1hSZWYvUGFnZXMgNiAwIFIKL1R5cGUvUGFnZS9NZWRpYUJveCBbMCAwIDQ4MCA1MF0KL0NvbnRlbnRzIDw8L0V4dEdTdGF0ZSA8PC9JRCBbPDwvTGVuZ3RoIDQ4XQovRm9udCA8PC9GMSA8PC9GMiA8PC9GMyA8PC9GNCBbPDwvTGVuZ3RoIDQ4XQovRm9udCA8PC9GNSA8PC9GNiA8PC9GNyBbPDwvTGVuZ3RoIDQ4XQovRm9udCA8PC9GOCAvPj4KZW5kb2JqCjEwIDAgb2JqCjw8L1R5cGUvUGFnZS9NYWRlYUJveCBbMCAwIDQ4MCA1MF0KL0NvbnRlbnRzIDw8L0V4dEdTdGF0ZSA8PC9JRCBbPDwvTGVuZ3RoIDQ4XQovRm9udCA8PC9GMSA8PC9GMiA8PC9GMyA8PC9GNCBbPDwvTGVuZ3RoIDQ4XQovRm9udCA8PC9GNSA8PC9GNiA8PC9GNyBbPDwvTGVuZ3RoIDQ4XQovRm9udCA8PC9GOCAvPj4KZW5kb2JqCjEwIDAgb2JqCjw8L1R5cGUvUGFnZS9NYWRlYUJveCBbMCAwIDQ4MCA1MF0KL0NvbnRlbnRzIDw8L0V4dEdTdGF0ZSA8PC9JRCBbPDwvTGVuZ3RoIDQ4XQovRm9udCA8PC9GMSA8PC9G</pdf>
<pdf>https://fake-link-to-pdf/</pdf>
<doc>data:application/msword;base64,UEsDBBQAAAAIAI+Q1k5a2gAAABQAAAAIAAAAbmFtZS5kb2N4VVQJAAD9AAAACwAAAB4AAAAAA==</doc>
<doc>https://fake-link-to-doc/</doc>
<docx>data:application/vnd.openxmlformats-officedocument.wordprocessingml.document;base64,UEsDBBQAAAAIAI+Q1k5a2gAAABQAAAAIAAAAbmFtZS5kb2N4VVQJAAD9AAAACwAAAB4AAAAAA==</docx>
<docx>https://fake-link-to-docx/</docx>
<file>data:application/octet-stream;base64,UEsDBBQAAAAIAI+Q1k5a2gAAABQAAAAIAAAAbmFtZS5kb2N4VVQJAAD9AAAACwAAAB4AAAAAA==</file>
<file>https://fake-link-to-binary/</file>
This part will also be discarded upon parsing
</message>
```
#### Amazon Bedrock
Modified the `Converse` API request generator to handle the subset of
binary content supported by Amazon Bedrock (PDF, DOC, DOCX, and Image),
as documented
[here](https://docs.aws.amazon.com/sdkfornet/v3/apidocs/items/BedrockRuntime/TContentBlock.html).
#### OpenAI
Modified the client to handle PDF content, audio content, and file
references when generating a request to an OpenAI (or OpenAI compatible)
client.
<!-- Describe your changes, the overall approach, the underlying design.
These notes will help understanding how your code works. Thanks! -->
### Contribution Checklist
<!-- Before submitting this PR, please make sure: -->
- [X] The code builds clean without any errors or warnings
- [X] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [X] All unit tests pass, and I have added new tests where possible
- [X] I didn't break anyone 😄
---------
Co-authored-by: Roger Barreto <19890735+RogerBarreto@users.noreply.github.com>1 parent 4e40f39 commit 5bd3629
File tree
5 files changed
+345
-27
lines changed- dotnet
- samples/Concepts
- PromptTemplates
- src
- SemanticKernel.Abstractions/AI/ChatCompletion
- SemanticKernel.UnitTests/Prompt
5 files changed
+345
-27
lines changedLines changed: 86 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
Lines changed: 86 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
192 | 192 | | |
193 | 193 | | |
194 | 194 | | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
195 | 198 | | |
196 | 199 | | |
| 200 | + | |
197 | 201 | | |
198 | 202 | | |
199 | 203 | | |
200 | | - | |
201 | 204 | | |
202 | | - | |
203 | 205 | | |
204 | | - | |
| 206 | + | |
205 | 207 | | |
206 | 208 | | |
207 | 209 | | |
| |||
Lines changed: 42 additions & 23 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | 15 | | |
21 | 16 | | |
22 | 17 | | |
| |||
73 | 68 | | |
74 | 69 | | |
75 | 70 | | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | | - | |
86 | | - | |
87 | | - | |
| 71 | + | |
88 | 72 | | |
89 | | - | |
| 73 | + | |
| 74 | + | |
90 | 75 | | |
91 | 76 | | |
92 | 77 | | |
| |||
103 | 88 | | |
104 | 89 | | |
105 | 90 | | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
106 | 114 | | |
107 | 115 | | |
108 | 116 | | |
109 | 117 | | |
110 | 118 | | |
111 | | - | |
112 | | - | |
113 | | - | |
114 | | - | |
115 | | - | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
116 | 128 | | |
117 | 129 | | |
118 | 130 | | |
119 | 131 | | |
120 | 132 | | |
121 | 133 | | |
122 | 134 | | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
123 | 142 | | |
0 commit comments