TimeSeriesExamAgent combines templates and LLM agents to generate scalable time series reasoning benchmarks, demonstrating that current LLMs have limited performance on both abstract and domain-specific tasks.
Learning to compress prompts with gist tokens, 2024
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
other 1
citation-polarity summary
fields
cs.AI 1years
2026 1verdicts
CONDITIONAL 1roles
other 1polarities
unclear 1representative citing papers
citing papers explorer
-
TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale
TimeSeriesExamAgent combines templates and LLM agents to generate scalable time series reasoning benchmarks, demonstrating that current LLMs have limited performance on both abstract and domain-specific tasks.