fix(ppstructurev3): escape html-sensitive OCR text in table markdown output#17924
Open
jimmyzhuu wants to merge 2 commits intoPaddlePaddle:mainfrom
Open
fix(ppstructurev3): escape html-sensitive OCR text in table markdown output#17924jimmyzhuu wants to merge 2 commits intoPaddlePaddle:mainfrom
jimmyzhuu wants to merge 2 commits intoPaddlePaddle:mainfrom
Conversation
|
Thanks for your contribution! |
Author
|
已补一个纯格式化提交,处理掉本次 CI 里的 black / pre-commit 改写问题。当前 test-pr 的失败点仍在依赖安装阶段(paddlepaddle 版本解析失败),看起来不是这次表格 markdown 转义改动本身引起的。 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Addresses #16096
Summary
This PR fixes an HTML escaping issue in
PPStructureV3table markdown export.When OCR text inside table cells contains HTML-sensitive content such as
<recv .../>or<pause .../>, the current table HTML assembly path may inject raw OCR text directly into<td>nodes. This makes the markdown output render incorrectly and can blur the boundary between original OCR content and generated HTML structure.This PR narrows the fix to the
PPStructureV3table export path only.Changes
html.escape(..., quote=True)<b>...</b>wrapper so bold formatting is not lost<recv .../>Scope
This PR does not change
PaddleOCRVL.It only fixes the
PPStructureV3table markdown export path.Tests