Peak-Detector uses instruction-tuned LLMs and a condensed peak-representation of time-series data to achieve robust cross-modal peak detection with self-generated explanations across ECG, PPG, BCG, and BSG signals.
Evaluating large language models on time series feature understanding: A comprehensive taxonomy and benchmark.arXiv preprint arXiv:2404.16563, 2024
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
TSVer is a new benchmark dataset for fact verification against time-series evidence, with 304 annotated real-world claims, 400 time series, verdicts, and justifications, plus baseline results showing current models struggle.
TS-Agent is an agentic framework that uses LLMs only for evidence-based reasoning while delegating extraction to raw time series tools, matching or exceeding baselines on four benchmarks with largest gains on reasoning tasks.
GeoMind applies an agentic workflow with tool-augmented modules and process supervision to outperform static models on lithology classification from well logs while producing traceable decisions.
SupChain-Bench reveals substantial gaps in LLM reliability for long-horizon supply chain orchestration, while the proposed SupChain-ReAct framework improves tool-calling by autonomously synthesizing procedures.
BEDTime benchmark tests 17 models on describing time series structure and finds vision-language models outperform dedicated time-series-language models and language-only approaches, with all models fragile to robustness tests.
citing papers explorer
-
Peak-Detector: Explainable Peak Detection via Instruction-Tuned Large Language Models in Physiological Sign
Peak-Detector uses instruction-tuned LLMs and a condensed peak-representation of time-series data to achieve robust cross-modal peak detection with self-generated explanations across ECG, PPG, BCG, and BSG signals.
-
TSVer: A Benchmark for Fact Verification Against Time-Series Evidence
TSVer is a new benchmark dataset for fact verification against time-series evidence, with 304 annotated real-world claims, 400 time series, verdicts, and justifications, plus baseline results showing current models struggle.
-
TS-Agent: Understanding and Reasoning Over Raw Time Series via Iterative Insight Gathering
TS-Agent is an agentic framework that uses LLMs only for evidence-based reasoning while delegating extraction to raw time series tools, matching or exceeding baselines on four benchmarks with largest gains on reasoning tasks.
-
GeoMind: An Agentic Workflow for Lithology Classification with Reasoned Tool Invocation
GeoMind applies an agentic workflow with tool-augmented modules and process supervision to outperform static models on lithology classification from well logs while producing traceable decisions.
-
SupChain-Bench: Benchmarking Large Language Models for Real-World Supply Chain Management
SupChain-Bench reveals substantial gaps in LLM reliability for long-horizon supply chain orchestration, while the proposed SupChain-ReAct framework improves tool-calling by autonomously synthesizing procedures.
-
BEDTime: A Unified Benchmark for Automatically Describing Time Series
BEDTime benchmark tests 17 models on describing time series structure and finds vision-language models outperform dedicated time-series-language models and language-only approaches, with all models fragile to robustness tests.