ConvFinQA: Exploring the chain of numerical reasoning in 20 conversational finance question answering

Zhiyu Chen, Shiyang Li, et al · 2022 · arXiv 2210.03849

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs

cs.CL · 2025-10-10 · unverdicted · novelty 7.0

FinAuditing is a taxonomy-structured multi-document benchmark with 1,102 instances averaging over 33k tokens from XBRL filings, defining three tasks to evaluate LLMs on financial auditing capabilities.

FinTagging: Benchmarking LLMs for Extracting and Structuring Financial Information

cs.CL · 2025-05-27 · unverdicted · novelty 7.0

FinTagging decomposes XBRL tagging into FinNI extraction and FinCL full-taxonomy linking, showing LLMs handle extraction but struggle with fine-grained concept alignment in zero-shot settings.

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks

cs.CL · 2022-11-22 · unverdicted · novelty 7.0

PoT prompting improves numerical reasoning by having language models write programs executed by a computer instead of performing calculations in natural language chains of thought, with an average 12% gain over CoT.

FINESSE-Bench: A Hierarchical Benchmark Suite for Financial Domain Knowledge and Technical Analysis in Large Language Models

cs.CL · 2026-05-14 · unverdicted · novelty 6.0 · 2 refs

FINESSE-Bench is a new hierarchical benchmark suite combining certification-style exams, trading tasks, and a Russian olympiad set to evaluate LLMs on financial competencies at multiple difficulty levels.

Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning

cs.LG · 2026-04-23 · unverdicted · novelty 5.0

TaNOS decouples table semantics from numerical structure via anonymization, sketches, and program-first self-supervision, yielding 80.13% FinQA accuracy with 10% data and near-zero cross-domain gap versus over 10pp for standard fine-tuning.

Reducing Redundancy in Retrieval-Augmented Generation through Chunk Filtering

cs.CL · 2026-04-27 · unverdicted · novelty 4.0

Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.

Bridging Language Models and Financial Analysis

q-fin.ST · 2025-03-14 · unverdicted · novelty 2.0

A survey synthesizing recent LLM research and assessing its applicability to financial data analysis.

citing papers explorer

Showing 7 of 7 citing papers.

FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs cs.CL · 2025-10-10 · unverdicted · none · ref 3
FinAuditing is a taxonomy-structured multi-document benchmark with 1,102 instances averaging over 33k tokens from XBRL filings, defining three tasks to evaluate LLMs on financial auditing capabilities.
FinTagging: Benchmarking LLMs for Extracting and Structuring Financial Information cs.CL · 2025-05-27 · unverdicted · none · ref 6
FinTagging decomposes XBRL tagging into FinNI extraction and FinCL full-taxonomy linking, showing LLMs handle extraction but struggle with fine-grained concept alignment in zero-shot settings.
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks cs.CL · 2022-11-22 · unverdicted · none · ref 5
PoT prompting improves numerical reasoning by having language models write programs executed by a computer instead of performing calculations in natural language chains of thought, with an average 12% gain over CoT.
FINESSE-Bench: A Hierarchical Benchmark Suite for Financial Domain Knowledge and Technical Analysis in Large Language Models cs.CL · 2026-05-14 · unverdicted · none · ref 2 · 2 links
FINESSE-Bench is a new hierarchical benchmark suite combining certification-style exams, trading tasks, and a Russian olympiad set to evaluate LLMs on financial competencies at multiple difficulty levels.
Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning cs.LG · 2026-04-23 · unverdicted · none · ref 5
TaNOS decouples table semantics from numerical structure via anonymization, sketches, and program-first self-supervision, yielding 80.13% FinQA accuracy with 10% data and near-zero cross-domain gap versus over 10pp for standard fine-tuning.
Reducing Redundancy in Retrieval-Augmented Generation through Chunk Filtering cs.CL · 2026-04-27 · unverdicted · none · ref 8
Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.
Bridging Language Models and Financial Analysis q-fin.ST · 2025-03-14 · unverdicted · none · ref 15
A survey synthesizing recent LLM research and assessing its applicability to financial data analysis.

ConvFinQA: Exploring the chain of numerical reasoning in 20 conversational finance question answering

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer