CAN-QA creates 33,128 QA pairs from CAN traffic logs in 10 categories to test LLMs, which capture patterns but struggle with temporal reasoning and multi-condition inference.
Time-mqa: Time series multi-task question answering with context enhancement
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
TimeSeriesExamAgent combines templates and LLM agents to generate scalable time series reasoning benchmarks, demonstrating that current LLMs have limited performance on both abstract and domain-specific tasks.
TS-Agent is an agentic framework that uses LLMs only for evidence-based reasoning while delegating extraction to raw time series tools, matching or exceeding baselines on four benchmarks with largest gains on reasoning tasks.
Time-RA reformulates time series anomaly detection as a reasoning-intensive generative task and provides the RATs40K multimodal benchmark to evaluate and improve LLM-based diagnosis.
A survey proposing a taxonomy of Injective, Bridging, and Internal Alignment paradigms to evolve TSA into user-driven Time Series Question Answering with LLMs.