PyFi generates a 600K pyramid QA dataset for financial images using adversarial MCTS agents, allowing fine-tuned VLMs to decompose complex questions and achieve 19.52% and 8.06% accuracy gains on Qwen2.5-VL models.
InCompanion Proc ACM on Web Conference 2025, pages 785–788
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
q-fin.CP 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents
PyFi generates a 600K pyramid QA dataset for financial images using adversarial MCTS agents, allowing fine-tuned VLMs to decompose complex questions and achieve 19.52% and 8.06% accuracy gains on Qwen2.5-VL models.