TABALIGN pairs a diffusion language model planner emitting binary cell masks with a trained attention verifier, raising average accuracy 15.76 points over strong baselines on eight table benchmarks while speeding execution 44.64%.
Wei Zhou, Bolei Ma, Annemarie Friedrich, and Mohsen Mesgar
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
FrontierFinance benchmark shows human financial experts outperform state-of-the-art LLMs by achieving higher scores and more client-ready outputs on realistic long-horizon tasks.
TABQAWORLD improves multi-turn table QA by dynamically selecting multimodal representations and optimizing reasoning trajectories with metadata, delivering 4.87% accuracy gains over baselines and 33.35% latency reduction.
FinReasoning is a hierarchical benchmark that decomposes LLM financial research capabilities into semantic consistency, data alignment, and deep insight, revealing model-type differences in auditing versus insight generation.
citing papers explorer
-
From Table to Cell: Attention for Better Reasoning with TABALIGN
TABALIGN pairs a diffusion language model planner emitting binary cell masks with a trained attention verifier, raising average accuracy 15.76 points over strong baselines on eight table benchmarks while speeding execution 44.64%.
-
FrontierFinance: A Long-Horizon Computer-Use Benchmark of Real-World Financial Tasks
FrontierFinance benchmark shows human financial experts outperform state-of-the-art LLMs by achieving higher scores and more client-ready outputs on realistic long-horizon tasks.
-
TABQAWORLD: Optimizing Multimodal Reasoning for Multi-Turn Table Question Answering
TABQAWORLD improves multi-turn table QA by dynamically selecting multimodal representations and optimizing reasoning trajectories with metadata, delivering 4.87% accuracy gains over baselines and 33.35% latency reduction.
-
FinReasoning: A Hierarchical Benchmark for Reliable Financial Research Reporting
FinReasoning is a hierarchical benchmark that decomposes LLM financial research capabilities into semantic consistency, data alignment, and deep insight, revealing model-type differences in auditing versus insight generation.