ConvFinQA: Exploring the chain of numerical reasoning in 20 conversational finance question answering

Convfinqa: Exploring the chain of numerical reasoning in conversational finance question answering , author= · 2022 · arXiv 2210.03849

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs

cs.CL · 2025-10-10 · unverdicted · novelty 7.0

FinAuditing is a taxonomy-structured multi-document benchmark with 1,102 instances averaging over 33k tokens from XBRL filings, defining three tasks to evaluate LLMs on financial auditing capabilities.

FinTagging: Benchmarking LLMs for Extracting and Structuring Financial Information

cs.CL · 2025-05-27 · unverdicted · novelty 7.0

FinTagging decomposes XBRL tagging into FinNI extraction and FinCL full-taxonomy linking, showing LLMs handle extraction but struggle with fine-grained concept alignment in zero-shot settings.

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks

cs.CL · 2022-11-22 · unverdicted · novelty 7.0

PoT prompting improves numerical reasoning by having language models write programs executed by a computer instead of performing calculations in natural language chains of thought, with an average 12% gain over CoT.

CLExEval: A Human-in-the-Loop Framework for Qualitative Evaluation of LLM Clinical Reasoning

cs.CL · 2026-06-30 · unverdicted · novelty 6.0

CLExEval introduces a human-annotated evaluation framework on 40 rare cases that identifies verbosity bias, hidden knowledge paradox, and 68.6% reasoning-to-output mismatch in LLMs while showing LLM-as-a-Judge overestimates reliability.

FINESSE-Bench: A Hierarchical Benchmark Suite for Financial Domain Knowledge and Technical Analysis in Large Language Models

cs.CL · 2026-05-14 · unverdicted · novelty 6.0 · 2 refs

FINESSE-Bench is a new hierarchical benchmark suite combining certification-style exams, trading tasks, and a Russian olympiad set to evaluate LLMs on financial competencies at multiple difficulty levels.

Reducing Redundancy in Retrieval-Augmented Generation through Chunk Filtering

cs.CL · 2026-04-27 · unverdicted · novelty 4.0

Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.

Bridging Language Models and Financial Analysis

q-fin.ST · 2025-03-14 · unverdicted · novelty 2.0

A survey synthesizing recent LLM research and assessing its applicability to financial data analysis.

Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning

cs.LG · 2026-04-23

citing papers explorer

Showing 4 of 4 citing papers after filters.

CLExEval: A Human-in-the-Loop Framework for Qualitative Evaluation of LLM Clinical Reasoning cs.CL · 2026-06-30 · unverdicted · none · ref 129
CLExEval introduces a human-annotated evaluation framework on 40 rare cases that identifies verbosity bias, hidden knowledge paradox, and 68.6% reasoning-to-output mismatch in LLMs while showing LLM-as-a-Judge overestimates reliability.
FINESSE-Bench: A Hierarchical Benchmark Suite for Financial Domain Knowledge and Technical Analysis in Large Language Models cs.CL · 2026-05-14 · unverdicted · none · ref 2 · 2 links
FINESSE-Bench is a new hierarchical benchmark suite combining certification-style exams, trading tasks, and a Russian olympiad set to evaluate LLMs on financial competencies at multiple difficulty levels.
Reducing Redundancy in Retrieval-Augmented Generation through Chunk Filtering cs.CL · 2026-04-27 · unverdicted · none · ref 8
Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.
Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning cs.LG · 2026-04-23 · unreviewed · ref 5

ConvFinQA: Exploring the chain of numerical reasoning in 20 conversational finance question answering

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer