pith. sign in

Evaluating large language models on time series feature understanding: A comprehensive taxonomy and benchmark.arXiv preprint arXiv:2404.16563, 2024

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

years

2026 3 2025 3

representative citing papers

TSVer: A Benchmark for Fact Verification Against Time-Series Evidence

cs.CL · 2025-11-02 · unverdicted · novelty 7.0

TSVer is a new benchmark dataset for fact verification against time-series evidence, with 304 annotated real-world claims, 400 time series, verdicts, and justifications, plus baseline results showing current models struggle.

BEDTime: A Unified Benchmark for Automatically Describing Time Series

cs.CL · 2025-09-05 · conditional · novelty 6.0

BEDTime benchmark tests 17 models on describing time series structure and finds vision-language models outperform dedicated time-series-language models and language-only approaches, with all models fragile to robustness tests.

citing papers explorer

Showing 6 of 6 citing papers.