From Time Series Analysis to Question Answering: A Survey in the LLM Era
Pith reviewed 2026-05-19 09:29 UTC · model grok-4.3
The pith
Time series analysis is evolving into flexible question answering by shifting from external to internal alignment with large language models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TSA is evolving toward TSQA, shifting from expert-driven and task-specific analysis to user-driven and task-unified question answering, organized into Injective Alignment, Bridging Alignment, and Internal Alignment paradigms driven by a shift from external to internal alignment.
What carries the argument
The three alignment paradigms (Injective Alignment, Bridging Alignment, and Internal Alignment) that organize literature by the degree of external versus internal integration between large language models and time series data.
If this is right
- Practitioners gain concrete criteria for picking alignment methods that suit their data scale and compute budget.
- Dataset creators should prioritize formats that support open-ended questions rather than single-task labels.
- Model developers can focus design effort on internal alignment techniques that reduce the need for separate preprocessing steps.
- Cross-domain applications become easier once the same alignment choice works for both short sensor streams and long financial series.
Where Pith is reading between the lines
- The same external-to-internal lens could be applied to other data types such as graphs or spatial data to create similar unified frameworks.
- Internal alignment may eventually allow single models to handle mixed temporal and textual queries without task-specific fine-tuning.
- Testing the taxonomy on private industry datasets would reveal whether the guidance remains generalizable beyond public benchmarks.
Load-bearing premise
The proposed division into external-to-internal alignment stages correctly mirrors how the field has actually progressed and gives reliable advice for choosing methods in new settings.
What would settle it
A new survey or set of case studies that finds most current work still relies on external tools and does not show a measurable trend toward internal alignment methods.
Figures
read the original abstract
Recently, Large Language Models (LLMs) have introduced a novel paradigm in Time Series Analysis (TSA), leveraging strong language capabilities to support tasks such as forecasting and anomaly detection. However, these analysis tasks cannot adequately cover temporal language tasks, such as interpretation and captioning. A fundamental gap remains between TSA and LLMs: LLMs are pre-trained to optimize natural language relevance for question answering rather than objectives specialized for TSA. To bridge this gap, TSA is evolving toward Time Series Question Answering (TSQA), shifting from expert-driven and task-specific analysis to user-driven and task-unified question answering. TSQA depends on flexible exploration rather than predefined TSA pipelines. In this survey, we first propose a taxonomy that reflects the evolution from TSA to TSQA, driven by a shift from external to internal alignment. We then organize existing literature into three alignment paradigms: Injective Alignment, Bridging Alignment, and Internal Alignment, and provide practical guidance for flexible, economical, and generalizable selection of alignment paradigms. We finally analyze datasets across domains and characteristics, identify challenges, and highlight future research directions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a survey on the integration of Large Language Models with Time Series Analysis (TSA). It claims that TSA is evolving toward Time Series Question Answering (TSQA), shifting from expert-driven, task-specific methods to user-driven, task-unified question answering. The central contribution is a taxonomy organizing the literature into three alignment paradigms—Injective Alignment, Bridging Alignment, and Internal Alignment—driven by a progression from external to internal alignment mechanisms. The paper reviews existing works under this taxonomy, supplies practical guidance for paradigm selection, analyzes datasets across domains, and identifies challenges plus future directions.
Significance. If the taxonomy is shown to be reproducible and the literature coverage is comprehensive, the survey would provide a useful organizing lens for a fast-moving interdisciplinary area. It synthesizes disparate TSA+LLM efforts, highlights the move toward flexible question-answering interfaces, and supplies dataset overviews that could aid new researchers. Explicit credit is due for attempting to move beyond task-specific pipelines toward unified, user-facing temporal reasoning.
major comments (1)
- [§3] §3 (Taxonomy of Alignment Paradigms): Explicit classification criteria, decision rules, or boundary examples are not supplied for assigning works to Injective, Bridging, or Internal Alignment. Without these, or coverage statistics showing how the surveyed papers partition, it remains unclear whether the external-to-internal shift accurately reflects the literature or functions mainly as a post-hoc organizing lens, which directly affects the defensibility of the practical guidance for paradigm selection.
minor comments (2)
- [Abstract] Abstract and §1: The scope of the literature search (keywords, time window, venues) is not stated, making it hard to assess completeness.
- [§5] §5 (Datasets): A summary table comparing domain, size, task types, and alignment paradigm coverage would improve readability and allow readers to quickly locate relevant resources.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our survey. We have reviewed the major comment carefully and provide a point-by-point response below, including planned revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3] §3 (Taxonomy of Alignment Paradigms): Explicit classification criteria, decision rules, or boundary examples are not supplied for assigning works to Injective, Bridging, or Internal Alignment. Without these, or coverage statistics showing how the surveyed papers partition, it remains unclear whether the external-to-internal shift accurately reflects the literature or functions mainly as a post-hoc organizing lens, which directly affects the defensibility of the practical guidance for paradigm selection.
Authors: We thank the referee for this important observation. Section 3 defines the paradigms according to the primary alignment mechanism: Injective Alignment directly projects time-series representations into the LLM input space (e.g., via linear or convolutional adapters without intermediate modules); Bridging Alignment introduces auxiliary components such as separate time-series encoders, retrieval modules, or adapters that mediate between modalities; and Internal Alignment modifies the LLM itself through continued pre-training, architectural changes, or parameter-efficient fine-tuning to internalize temporal reasoning. These distinctions are illustrated with representative works, but we agree that explicit decision rules and boundary cases are needed for reproducibility. In the revised manuscript we will add a new subsection (3.4) containing (i) a decision flowchart with concrete criteria (e.g., “if the method uses an external encoder whose output is concatenated to the LLM prompt, classify as Bridging; if the encoder is removed at inference and the LLM weights are updated on time-series objectives, classify as Internal”), (ii) three boundary examples per category with justification, and (iii) a coverage table reporting the number and percentage of surveyed papers assigned to each paradigm. These additions will make the taxonomy falsifiable and will directly support the practical guidance for paradigm selection. revision: yes
Circularity Check
Survey taxonomy organizes external literature without self-referential reduction
full rationale
This is a survey paper that reviews and organizes existing TSA+LLM literature into three alignment paradigms (Injective, Bridging, Internal) based on a proposed external-to-internal shift. The central taxonomy is presented as an organizing framework derived from cited external works rather than any fitted parameters, equations, or self-definitional loops. No load-bearing self-citations, ansatzes smuggled via prior author work, or predictions that reduce to inputs by construction appear in the provided abstract or structure. The derivation chain is self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs are pre-trained to optimize natural language relevance for question answering rather than objectives specialized for TSA
invented entities (4)
-
Time Series Question Answering (TSQA)
no independent evidence
-
Injective Alignment
no independent evidence
-
Bridging Alignment
no independent evidence
-
Internal Alignment
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/Cost/FunctionalEquation.lean; IndisputableMonolith/Foundation/AlexanderDuality.leanreality_from_one_distinction; alexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose three alignment paradigms: Injective Alignment, Bridging Alignment, and Internal Alignment, which are emphasized by prioritizing different aspects of time-series primitives: domain, characteristic, and representation
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
R. Abdel-Sater and A. B. Hamza. A federated large language model for long-term time series forecasting. arXiv preprint, abs/2407.20503, 2024
- [2]
-
[3]
A. F. Ansari, L. Stella, C. Turkmen, X. Zhang, P. Mercado, H. Shen, O. Shchur, S. S. Ran- gapuram, S. P. Arango, S. Kapoor, J. Zschiegner, D. C. Maddix, H. Wang, M. W. Mahoney, K. Torkkola, A. G. Wilson, M. Bohlke-Schneider, and Y . Wang. Chronos: Learning the language of time series. arXiv preprint, abs/2403.07815, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[4]
F. Bellos and N. H. N. nd Jason J. Corso. Vitro: V ocabulary inversion for time-series represen- tation optimization. arXiv preprint, abs/2412.17921, 2024
- [5]
-
[6]
Y . Bian, X. Ju, J. Li, Z. Xu, D. Cheng, and Q. Xu. Multi-patch prediction: Adapting language models for time series representation learning. In ICML, 2024
work page 2024
-
[7]
Y . Cai, M. Goswami, A. Choudhry, A. Srinivasan, and A. Dubrawski. Jolt: Jointly learned representations of language and time- series. In NeurIPS, 2023
work page 2023
- [8]
-
[9]
D. Cao, F. Jia, S. ¨O. Arik, T. Pfister, Y . Zheng, W. Ye, and Y . Liu. TEMPO: prompt-based generative pre-trained transformer for time series forecasting. In ICLR, 2024
work page 2024
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
-
[18]
J. Cosentino, A. Belyaeva, X. Liu, et al. Towards a personal health large language model. arXiv preprint, abs/2406.06474, 2024
-
[19]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI, D. Guo, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint, abs/2501.12948, 2025. 10
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [20]
- [21]
- [22]
-
[23]
Y . Duan, C. Chau, Z. Wang, Y . Wang, and C. Lin. Dewave: Discrete encoding of EEG waves for EEG to text translation. In NeurIPS, 2023
work page 2023
-
[24]
A. Ermshaus, P. Sch¨afer, and U. Leser. Raising the class of streaming time series segmentation. Proc. VLDB Endow., 17(8):1953–1966, 2024
work page 1953
- [25]
-
[26]
Y . Ge, J. Li, Y . Zhao, H. Wen, Z. Li, M. Qiu, H. Li, M. Jin, and S. Pan. T2s: High-resolution time series generation with text-to-series diffusion models. In IJCAI, 2025
work page 2025
-
[27]
K. Grauman, A. Westbury, E. Byrne, Z. Chavis, A. Furnari, R. Girdhar, et al. Ego4d: Around the world in 3,000 hours of egocentric video. In CVPR, pages 18995–19012, 2022
work page 2022
- [28]
- [29]
-
[30]
C. Han, Q. Wang, H. Peng, W. Xiong, Y . Chen, H. Ji, and S. Wang. Lm-infinite: Zero-shot extreme length generalization for large language models. In NAACL, pages 3991–4008, 2024
work page 2024
-
[31]
X. Hao, Y . Chen, C. Yang, Z. Du, C. Ma, C. Wu, and X. Meng. From chaos to clarity: Time series anomaly detection in astronomical observations. In ICDE, pages 570–583, 2024
work page 2024
-
[32]
N. Hollenstein, M. Troendle, C. Zhang, and N. Langer. Zuco 2.0: A dataset of physiological recordings during natural reading and annotation. arXiv preprint, abs/1912.00903, 2019
-
[33]
E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen. Lora: Low-rank adaptation of large language models. In ICLR, 2022
work page 2022
- [34]
- [35]
- [36]
-
[37]
F. Jia, K. Wang, Y . Zheng, D. Cao, and Y . Liu. GPT4MTS: prompt-based large language model for multimodal time-series forecasting. In AAAI, pages 23343–23351, 2024
work page 2024
- [38]
- [39]
- [40]
- [41]
-
[42]
M. Jin, S. Wang, L. Ma, Z. Chu, J. Y . Zhang, X. Shi, P. Chen, Y . Liang, Y . Li, et al. Time-llm: Time series forecasting by reprogramming large language models. In ICLR, 2024
work page 2024
- [43]
- [44]
- [45]
- [46]
-
[47]
H. Li, X. Chen, C. Zhang, S. F. Quan, W. D. S. Killgore, S.-F. Wung, C. X. Chen, G. Yuan, J. Lu, and A. Li. Enhancing visual inspection capability of multi-modal large language models on medical time series with supportive conformalized and interpretable small specialized models. arXiv preprint, abs/2501.16215, 2025
-
[48]
J. Li, C. Liu, S. Cheng, R. Arcucci, and S. Hong. Frozen language model helps ecg zero-shot learning. In Medical Imaging with Deep Learning, pages 402–415, 2024
work page 2024
- [49]
-
[50]
Z. Li, S. Li, and X. Yan. Time series as images: Vision transformer for irregularly sampled time series. In NeurIPS, volume 36, pages 49187–49204, 2023
work page 2023
- [51]
- [52]
- [53]
- [54]
- [55]
- [56]
- [57]
-
[58]
C. Liu, Z. Wan, S. Cheng, et al. ETP: learning transferable ecg representations via ecg-text pre-training. In ICASSP, pages 8230–8234, 2024
work page 2024
- [59]
- [60]
- [61]
- [62]
-
[63]
H. Liu, S. Xu, Z. Zhao, L. Kong, H. Kamarthi, et al. Time-mmd: Multi-domain multimodal dataset for time series analysis. NeurIPS Datasets and Benchmarks Track, 2024
work page 2024
- [64]
- [65]
- [66]
- [67]
- [68]
- [69]
- [70]
- [71]
-
[72]
Y . Liu, T. Hu, H. Zhang, et al. itransformer: Inverted transformers are effective for time series forecasting. In ICLR, 2024
work page 2024
- [73]
- [74]
-
[75]
A. Lopez-Lira and Y . Tang. Can chatgpt forecast stock price movements? return predictability and large language models. arXiv preprint, abs/2304.07619, 2023
-
[76]
Q. Ma, Z. Liu, Z. Zheng, Z. Huang, S. Zhu, Z. Yu, and J. T. Kwok. A survey on time-series pre-trained models. TKDE, 36(12):7536–7555, 2024
work page 2024
-
[77]
Y . Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. In ICLR, 2023
work page 2023
- [78]
-
[79]
J. Oh, G. Lee, S. Bae, et al. Ecg-qa: A comprehensive question answering dataset combined with electrocardiogram. NeurIPS, 36, 2024
work page 2024
-
[80]
OpenAI. GPT-4 technical report. arXiv preprint, abs/2303.08774, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.