This is the repository that contains source code for the MangaVQA and MangaLMM project website.
We present MangaVQA, a benchmark of 526 manually constructed question–answer pairs designed to evaluate an LMM's ability to accurately answer targeted, factual questions grounded in both visual and textual context. We also develop MangaLMM, a manga-specialized version of Qwen2.5-VL, finetuned to jointly address both VQA and OCR tasks.
If you find MangaVQA and MangaLMM useful for your work please cite:
@inproceedings{baek2025mangavqa,
author = {Baek, Jeonghun and Egashira, Kazuki and Onohara, Shota and Miyai, Atsuyuki and Imajuku, Yuki and Ikuta, Hikaru and Aizawa, Kiyoharu},
title = {MangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga Understanding},
booktitle = {Findings of the Association for Computational Linguistics: EACL 2026},
year = {2026},
}
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
This website is inspired by and references Nerfies.
