MangaVQA and MangaLMM

This is the repository that contains source code for the MangaVQA and MangaLMM project website.

Project Description

We present MangaVQA, a benchmark of 526 manually constructed question–answer pairs designed to evaluate an LMM's ability to accurately answer targeted, factual questions grounded in both visual and textual context. We also develop MangaLMM, a manga-specialized version of Qwen2.5-VL, finetuned to jointly address both VQA and OCR tasks.

Citation

If you find MangaVQA and MangaLMM useful for your work please cite:

@inproceedings{baek2025mangavqa,
  author    = {Baek, Jeonghun and Egashira, Kazuki and Onohara, Shota and Miyai, Atsuyuki and Imajuku, Yuki and Ikuta, Hikaru and Aizawa, Kiyoharu},
  title     = {MangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga Understanding},
  booktitle = {Findings of the Association for Computational Linguistics: EACL 2026},
  year      = {2026},
}

Links

Website License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

This website is inspired by and references Nerfies.

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.github/workflows		.github/workflows
static		static
.gitignore		.gitignore
CNAME		CNAME
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MangaVQA and MangaLMM

Project Description

Citation

Links

Website License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MangaVQA and MangaLMM

Project Description

Citation

Links

Website License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages