FinAuditing is a taxonomy-structured multi-document benchmark with 1,102 instances averaging over 33k tokens from XBRL filings, defining three tasks to evaluate LLMs on financial auditing capabilities.
ConvFinQA: Exploring the chain of numerical reasoning in 20 conversational finance question answering
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 7roles
background 1polarities
background 1representative citing papers
FinTagging decomposes XBRL tagging into FinNI extraction and FinCL full-taxonomy linking, showing LLMs handle extraction but struggle with fine-grained concept alignment in zero-shot settings.
PoT prompting improves numerical reasoning by having language models write programs executed by a computer instead of performing calculations in natural language chains of thought, with an average 12% gain over CoT.
FINESSE-Bench is a new hierarchical benchmark suite combining certification-style exams, trading tasks, and a Russian olympiad set to evaluate LLMs on financial competencies at multiple difficulty levels.
TaNOS decouples table semantics from numerical structure via anonymization, sketches, and program-first self-supervision, yielding 80.13% FinQA accuracy with 10% data and near-zero cross-domain gap versus over 10pp for standard fine-tuning.
Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.
A survey synthesizing recent LLM research and assessing its applicability to financial data analysis.
citing papers explorer
-
FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs
FinAuditing is a taxonomy-structured multi-document benchmark with 1,102 instances averaging over 33k tokens from XBRL filings, defining three tasks to evaluate LLMs on financial auditing capabilities.
-
FinTagging: Benchmarking LLMs for Extracting and Structuring Financial Information
FinTagging decomposes XBRL tagging into FinNI extraction and FinCL full-taxonomy linking, showing LLMs handle extraction but struggle with fine-grained concept alignment in zero-shot settings.
-
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
PoT prompting improves numerical reasoning by having language models write programs executed by a computer instead of performing calculations in natural language chains of thought, with an average 12% gain over CoT.
-
FINESSE-Bench: A Hierarchical Benchmark Suite for Financial Domain Knowledge and Technical Analysis in Large Language Models
FINESSE-Bench is a new hierarchical benchmark suite combining certification-style exams, trading tasks, and a Russian olympiad set to evaluate LLMs on financial competencies at multiple difficulty levels.
-
Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning
TaNOS decouples table semantics from numerical structure via anonymization, sketches, and program-first self-supervision, yielding 80.13% FinQA accuracy with 10% data and near-zero cross-domain gap versus over 10pp for standard fine-tuning.
-
Reducing Redundancy in Retrieval-Augmented Generation through Chunk Filtering
Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.
-
Bridging Language Models and Financial Analysis
A survey synthesizing recent LLM research and assessing its applicability to financial data analysis.