Konstantin Fedorov, Boris Zarubin, and Vladimir Ivanov

Alex Egg, Martin Iglesias Goyanes, Friso Kingma, Andreu Mora, Leandro von Werra, Thomas Wolf · 2025 · arXiv 2506.23719

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

PrepBench: How Far Are We from Natural-Language-Driven Data Preparation?

cs.DB · 2026-05-09 · unverdicted · novelty 7.0

PrepBench is a benchmark showing that state-of-the-art LLMs still struggle with natural-language-driven data preparation involving disambiguation, code generation, and workflow translation.

Structure-Grounded Knowledge Retrieval via Code Dependencies for Multi-Step Data Reasoning

cs.CL · 2026-04-12 · unverdicted · novelty 7.0

SGKR uses function-call dependency graphs to retrieve structured code knowledge, improving LLM correctness on multi-step data reasoning benchmarks over similarity baselines.

KAIROS: Stateful, Context-Aware Power-Efficient Agentic Inference Serving

cs.DC · 2026-04-17 · unverdicted · novelty 6.0

KAIROS reduces power by 27% on average (up to 39.8%) for agentic AI inference by using long-lived context to jointly manage GPU frequency, concurrency, and request routing across instances.

Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows

cs.AI · 2025-12-15 · unverdicted · novelty 6.0

Finch is a new benchmark with 172 composite workflows and 384 tasks from real enterprise data that shows top AI models like GPT-5.1 Pro pass only 38.4% of workflows under human evaluation.

Text Analytics Evaluation Framework: A Case Study on LLMs and Social Media

cs.CL · 2026-05-20 · unverdicted · novelty 5.0

Presents a new question-based evaluation framework for LLMs on aggregated social media text and reports that performance declines with input scale, task complexity, and numerical operations beyond 500 instances.

DataClawBench: An Agent Benchmark for Exploratory Real-World Financial Data Analysis

cs.AI · 2026-05-04 · 2 refs

citing papers explorer

Showing 6 of 6 citing papers.

PrepBench: How Far Are We from Natural-Language-Driven Data Preparation? cs.DB · 2026-05-09 · unverdicted · none · ref 11
PrepBench is a benchmark showing that state-of-the-art LLMs still struggle with natural-language-driven data preparation involving disambiguation, code generation, and workflow translation.
Structure-Grounded Knowledge Retrieval via Code Dependencies for Multi-Step Data Reasoning cs.CL · 2026-04-12 · unverdicted · none · ref 1
SGKR uses function-call dependency graphs to retrieve structured code knowledge, improving LLM correctness on multi-step data reasoning benchmarks over similarity baselines.
KAIROS: Stateful, Context-Aware Power-Efficient Agentic Inference Serving cs.DC · 2026-04-17 · unverdicted · none · ref 16
KAIROS reduces power by 27% on average (up to 39.8%) for agentic AI inference by using long-lived context to jointly manage GPU frequency, concurrency, and request routing across instances.
Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows cs.AI · 2025-12-15 · unverdicted · none · ref 18
Finch is a new benchmark with 172 composite workflows and 384 tasks from real enterprise data that shows top AI models like GPT-5.1 Pro pass only 38.4% of workflows under human evaluation.
Text Analytics Evaluation Framework: A Case Study on LLMs and Social Media cs.CL · 2026-05-20 · unverdicted · none · ref 77
Presents a new question-based evaluation framework for LLMs on aggregated social media text and reports that performance declines with input scale, task complexity, and numerical operations beyond 500 instances.
DataClawBench: An Agent Benchmark for Exploratory Real-World Financial Data Analysis cs.AI · 2026-05-04 · unreviewed · ref 2 · 2 links

Konstantin Fedorov, Boris Zarubin, and Vladimir Ivanov

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer