FinVQA is a new multilingual benchmark for Indic financial VQA with three difficulty levels and four formats, paired with the FIND framework for faithful numerical reasoning via fine-tuning and constrained decoding.
hub
arXiv preprint arXiv:2105.07624 , year =
16 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
dataset 1polarities
use dataset 1representative citing papers
LLMs copy biased analyst ratings in investment decisions but a new detection method encourages independent reasoning and can improve stock return predictions beyond human levels.
INDOTABVQA is a new benchmark dataset for cross-lingual table visual question answering on Bahasa Indonesia documents that exposes VLM weaknesses on complex tables and low-resource languages while showing gains from fine-tuning and table region coordinates.
FinAuditing is a taxonomy-structured multi-document benchmark with 1,102 instances averaging over 33k tokens from XBRL filings, defining three tasks to evaluate LLMs on financial auditing capabilities.
MAMMQA is a multi-agent framework that decomposes multimodal queries, retrieves modality-specific answers, performs cross-modal synthesis with VLMs, and integrates results via an LLM to outperform single-model baselines on QA benchmarks.
FinTagging decomposes XBRL tagging into FinNI extraction and FinCL full-taxonomy linking, showing LLMs handle extraction but struggle with fine-grained concept alignment in zero-shot settings.
FLARE is a vision-language model family using text-guided vision encoding, context-aware alignment decoding, dual-semantic mapping loss, and text-driven VQA synthesis to achieve deep cross-modal integration, outperforming larger models with only 630 vision tokens at 3B scale.
Proposes a three-step benchmark design method (define work activity, specify tested setting, score work product) derived from work studies and O*NET, demonstrated via three case analyses.
FINESSE-Bench is a new hierarchical benchmark suite combining certification-style exams, trading tasks, and a Russian olympiad set to evaluate LLMs on financial competencies at multiple difficulty levels.
Semantically invariant row and column permutations in tables can cause LLMs to output incorrect answers, and a gradient-based attack called ATP efficiently finds such permutations that degrade performance across many models.
TaNOS decouples table semantics from numerical structure via anonymization, sketches, and program-first self-supervision, yielding 80.13% FinQA accuracy with 10% data and near-zero cross-domain gap versus over 10pp for standard fine-tuning.
Systematic tests show that specific PDF parsers combined with overlapping chunking strategies better preserve structure and improve RAG answer correctness on financial QA benchmarks including the new TableQuest dataset.
AGREE boosts visual document retrieval by adding local relevance signals from MLLM attention maps to global document labels during retriever training.
RELOOP unifies retrieval across text, tables, and KGs via hierarchical sequences and dual-agent guided iteration, reporting EM/F1 gains over baselines on HotpotQA, HybridQA/TAT-QA, and MetaQA.
A survey synthesizing recent LLM research and assessing its applicability to financial data analysis.
citing papers explorer
-
FIND: Toward Multimodal Financial Reasoning and Question Answering for Indic Languages
FinVQA is a new multilingual benchmark for Indic financial VQA with three difficulty levels and four formats, paired with the FIND framework for faithful numerical reasoning via fine-tuning and constrained decoding.
-
Fin-Bias: Comprehensive Evaluation for LLM Decision-Making under human bias in Finance Domain
LLMs copy biased analyst ratings in investment decisions but a new detection method encourages independent reasoning and can improve stock return predictions beyond human levels.
-
INDOTABVQA: A Benchmark for Cross-Lingual Table Understanding in Bahasa Indonesia Documents
INDOTABVQA is a new benchmark dataset for cross-lingual table visual question answering on Bahasa Indonesia documents that exposes VLM weaknesses on complex tables and low-resource languages while showing gains from fine-tuning and table region coordinates.
-
FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs
FinAuditing is a taxonomy-structured multi-document benchmark with 1,102 instances averaging over 33k tokens from XBRL filings, defining three tasks to evaluate LLMs on financial auditing capabilities.
-
Rethinking Information Synthesis in Multimodal Question Answering A Multi-Agent Perspective
MAMMQA is a multi-agent framework that decomposes multimodal queries, retrieves modality-specific answers, performs cross-modal synthesis with VLMs, and integrates results via an LLM to outperform single-model baselines on QA benchmarks.
-
FinTagging: Benchmarking LLMs for Extracting and Structuring Financial Information
FinTagging decomposes XBRL tagging into FinNI extraction and FinCL full-taxonomy linking, showing LLMs handle extraction but struggle with fine-grained concept alignment in zero-shot settings.
-
FLARE: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
FLARE is a vision-language model family using text-guided vision encoding, context-aware alignment decoding, dual-semantic mapping loss, and text-driven VQA synthesis to achieve deep cross-modal integration, outperforming larger models with only 630 vision tokens at 3B scale.
-
Design and Report Benchmarks for Knowledge Work
Proposes a three-step benchmark design method (define work activity, specify tested setting, score work product) derived from work studies and O*NET, demonstrated via three case analyses.
-
FINESSE-Bench: A Hierarchical Benchmark Suite for Financial Domain Knowledge and Technical Analysis in Large Language Models
FINESSE-Bench is a new hierarchical benchmark suite combining certification-style exams, trading tasks, and a Russian olympiad set to evaluate LLMs on financial competencies at multiple difficulty levels.
-
The Power of Order: Fooling LLMs with Adversarial Table Permutations
Semantically invariant row and column permutations in tables can cause LLMs to output incorrect answers, and a gradient-based attack called ATP efficiently finds such permutations that degrade performance across many models.
-
Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning
TaNOS decouples table semantics from numerical structure via anonymization, sketches, and program-first self-supervision, yielding 80.13% FinQA accuracy with 10% data and near-zero cross-domain gap versus over 10pp for standard fine-tuning.
-
Empirical Evaluation of PDF Parsing and Chunking for Financial Question Answering with RAG
Systematic tests show that specific PDF parsers combined with overlapping chunking strategies better preserve structure and improve RAG answer correctness on financial QA benchmarks including the new TableQuest dataset.
-
Attention Grounded Enhancement for Visual Document Retrieval
AGREE boosts visual document retrieval by adding local relevance signals from MLLM attention maps to global document labels during retriever training.
-
RELOOP: Recursive Retrieval with Multi-Hop Reasoner and Planners for Heterogeneous QA
RELOOP unifies retrieval across text, tables, and KGs via hierarchical sequences and dual-agent guided iteration, reporting EM/F1 gains over baselines on HotpotQA, HybridQA/TAT-QA, and MetaQA.
-
Bridging Language Models and Financial Analysis
A survey synthesizing recent LLM research and assessing its applicability to financial data analysis.
- VT-Bench: A Unified Benchmark for Visual-Tabular Multi-Modal Learning