arxiv: 2605.09395 · v1 · submitted 2026-05-10 · 💻 cs.AI · cs.LG· cs.MA· cs.MM

Recognition: no theorem link

Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning

Lin Li , Jiawei Huang , Qihao Quan , Dan Li , Boxin Li , Xiao Zhang , Erli Meng , Wenjie Feng

show 2 more authors

Jian Lou See-kiong Ng

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:36 UTC · model grok-4.3

classification 💻 cs.AI cs.LGcs.MAcs.MM

keywords few-shot learningtime series classificationvision-language modelsagentic reasoningknowledge bankreflective agentsmultimodal classificationtest-time adaptation

0 comments

The pith

A three-role agentic framework with a self-evolving knowledge bank raises VLM accuracy on few-shot multimodal time series classification while generating human-readable feature explanations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MarsTSC as the first agentic reasoning system that enables vision-language models to handle few-shot multimodal time series classification. Three collaborating agents maintain and refine a dynamic knowledge bank: the Generator performs classification through reasoning, the Reflector diagnoses missed temporal patterns, and the Modifier safely integrates verified corrections to prevent collapse. A separate test-time update mechanism allows cautious ongoing refinement to counter limited examples and data shifts. A sympathetic reader would care because the approach turns scarce data into reliable, explainable decisions on standard benchmarks.

Core claim

MarsTSC introduces a VLM agentic reasoning framework for few-shot multimodal time series classification that maintains a self-evolving knowledge bank iteratively refined via reflective agents. The Generator conducts classification with reasoning, the Reflector identifies root causes of errors and overlooked temporal features, and the Modifier applies verified updates to avoid context collapse, supported by a test-time update strategy that mitigates few-shot bias and distribution shift.

What carries the argument

The MarsTSC three-role agentic system with a self-evolving knowledge bank, where the Generator, Reflector, and Modifier collaborate to iteratively refine context for classification and produce interpretable rationales.

If this is right

Delivers substantial and consistent performance gains across 6 VLM backbones on 12 mainstream time series benchmarks under few-shot conditions.
Outperforms both classical and foundation model-based time series baselines.
Produces interpretable rationales that ground each classification decision in human-readable feature evidence.
Uses test-time updates to mitigate few-shot bias and distribution shift through cautious knowledge bank refinement.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same Generator-Reflector-Modifier structure could be tested on other few-shot multimodal tasks by shifting focus from temporal to spatial or sequential patterns.
Stable knowledge bank updates might reduce reliance on large labeled sets in streaming or online classification settings.
The approach invites experiments that measure how well the rationales match expert-identified features on new datasets.

Load-bearing premise

The Reflector can reliably detect temporal features missed by the Generator, and the Modifier can incorporate updates without introducing new biases or causing the knowledge bank to collapse or overfit.

What would settle it

On the 12 time series benchmarks, removing the Reflector and Modifier roles and measuring whether accuracy gains over base VLMs disappear and whether the generated rationales no longer align with the actual discriminative temporal features.

Figures

Figures reproduced from arXiv: 2605.09395 by Boxin Li, Dan Li, Erli Meng, Jian Lou, Jiawei Huang, Lin Li, Qihao Quan, See-kiong Ng, Wenjie Feng, Xiao Zhang.

**Figure 1.** Figure 1: Overview of our MarsTSC framework. It comprises three key stages: Warm-up Stage, Training Stage and Testing Stage. sophisticated designs, including TimesNet[46], MultiRocket[41], Autoformer[47], PatchTST[33]. Recent progress in LLMs has fueled increasing interest in adopting large models for time series tasks. Early attempts includes adapting pretrained LLMs by fine-tuning on abundant datasets using text… view at source ↗

**Figure 2.** Figure 2: Illustration of knowledge bank design. Intra-class Feature Initialization. We construct intra-class feature descriptors to capture prototypical characteristics of samples for each class, covering both the numeric and imaged time series modalities and organizing their complementary features jointly. Specifically, we leverage the cross-modal alignment capabilities of VLMs to analyze the visual representatio… view at source ↗

**Figure 3.** Figure 3: Comparison with Increasing K-shot Baselines [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: A case about train refinement on success reasoning. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: A case about train refinement on erroneous rea [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: left: Comparison of mean accuracy and standard [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Token count of the knowledge bank over the course [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Classification accuracy with different base VLM models [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Two inference paths that were successful in predict [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 12.** Figure 12: A case about self-recheck in test time. Pass-2 rea [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗

read the original abstract

In this paper, we propose the first VL$\underline{\textbf{M}}$ $\underline{\textbf{a}}$gentic $\underline{\textbf{r}}$easoning framework for few-$\underline{\textbf{s}}$hot multimodal $\underline{\textbf{T}}$ime $\underline{\textbf{S}}$eries $\underline{\textbf{C}}$lassification ($\textbf{MarsTSC}$), which introduces a self-evolving knowledge bank as a dynamic context iteratively refined via reflective agentic reasoning. The framework comprises three collaborative roles: i) Generator conducts reliable classification via reasoning; ii) Reflector diagnoses the root causes of reasoning errors to yield discriminative insights targeting the temporal features overlooked by Generator; iii) Modifier applies verified updates to the knowledge bank to prevent context collapse. We further introduce a test-time update strategy to enable cautious, continuous knowledge bank refinement to mitigate few-shot bias and distribution shift. Extensive experiments across 12 mainstream time series benchmarks demonstrate that $\textbf{MarsTSC}$ delivers substantial and consistent performance gains across 6 VLM backbones, outperforming both classical and foundation model-based time series baselines under few-shot conditions, while producing interpretable rationales that ground each classification decision in human-readable feature evidence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MarsTSC adds a three-agent structure with a self-evolving knowledge bank for VLM few-shot time series classification, but the safety of the Reflector-Modifier updates in low-data regimes stays unproven.

read the letter

The paper's core idea is a tailored agentic setup for few-shot multimodal time series classification. It uses a Generator for initial reasoning, a Reflector to spot missed temporal features, and a Modifier to apply updates to a dynamic knowledge bank, plus a test-time refinement step to handle distribution shift. This combination is positioned as new for VLMs in this domain, and the work does a reasonable job highlighting the value of producing human-readable rationales tied to actual features rather than black-box outputs.

Referee Report

3 major / 1 minor

Summary. The paper proposes MarsTSC, the first VLM agentic reasoning framework for few-shot multimodal time series classification. It introduces a self-evolving knowledge bank iteratively refined through three collaborative agents: a Generator that performs classification via reasoning, a Reflector that diagnoses reasoning errors to identify overlooked temporal features, and a Modifier that applies verified updates to prevent context collapse. A test-time update strategy is added to enable cautious refinement and mitigate few-shot bias and distribution shift. The central empirical claim is that this yields substantial and consistent gains across 12 time series benchmarks and 6 VLM backbones, outperforming classical and foundation-model baselines while producing human-readable interpretable rationales.

Significance. If the agentic refinement mechanism and reported gains prove robust, the work would offer a meaningful advance in applying VLMs to few-shot time series tasks by addressing context collapse and bias through dynamic, interpretable knowledge updates. The emphasis on human-readable feature evidence could support broader adoption in domains requiring explainability. However, the significance hinges on verification that the Reflector-Modifier loop reliably surfaces temporal features without introducing new biases or overfitting in low-data regimes, an aspect not yet demonstrated with sufficient rigor.

major comments (3)

[Abstract / Framework] Abstract and framework description: The claim that the Reflector 'diagnoses the root causes of reasoning errors to yield discriminative insights targeting the temporal features overlooked by Generator' and that the Modifier applies 'verified updates' to prevent context collapse is load-bearing for the central contribution, yet no concrete verification criteria, consistency checks, pseudocode, or safeguards against hallucination amplification or knowledge-bank drift are provided. This leaves the weakest assumption (safe iterative refinement without bias or collapse in few-shot settings) unaddressed.
[Experiments] Experimental claims: The assertion of 'substantial and consistent performance gains across 12 mainstream time series benchmarks' and '6 VLM backbones' is presented without any reported details on baseline definitions, number of runs, statistical tests, error bars, ablation studies on individual agents, or error analysis. This prevents verification that the improvements are supported by the data rather than artifacts of the few-shot setup or VLM prompting.
[Test-time update strategy] Test-time update strategy: The description of 'cautious, continuous knowledge bank refinement to mitigate few-shot bias and distribution shift' is central to handling the few-shot regime, but no implementation specifics, update rules, or empirical validation of its effect on preventing overfitting or drift across the 12 benchmarks are supplied.

minor comments (1)

[Abstract] The acronym expansion in the abstract uses underlines (VL M a gentic r easoning ... T ime S eries C lassification) that appear to be a formatting artifact rather than standard notation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. The comments identify important areas where additional rigor and transparency will strengthen the manuscript. We address each major comment below and commit to revisions that provide the requested details without altering the core claims.

read point-by-point responses

Referee: [Abstract / Framework] Abstract and framework description: The claim that the Reflector 'diagnoses the root causes of reasoning errors to yield discriminative insights targeting the temporal features overlooked by Generator' and that the Modifier applies 'verified updates' to prevent context collapse is load-bearing for the central contribution, yet no concrete verification criteria, consistency checks, pseudocode, or safeguards against hallucination amplification or knowledge-bank drift are provided. This leaves the weakest assumption (safe iterative refinement without bias or collapse in few-shot settings) unaddressed.

Authors: We agree that the current description of the Reflector-Modifier loop is high-level and that explicit verification criteria and safeguards are needed to substantiate the claim of safe iterative refinement. In the revised manuscript we will add a dedicated algorithm box with full pseudocode for the three-agent cycle, define the Modifier's verification criteria (cross-check against a small held-out validation set plus temporal-feature consistency rules), and introduce explicit safeguards including confidence thresholding and drift detection to mitigate hallucination amplification and knowledge-bank drift. These additions will directly address the concern about unaddressed assumptions in few-shot regimes. revision: yes
Referee: [Experiments] Experimental claims: The assertion of 'substantial and consistent performance gains across 12 mainstream time series benchmarks' and '6 VLM backbones' is presented without any reported details on baseline definitions, number of runs, statistical tests, error bars, ablation studies on individual agents, or error analysis. This prevents verification that the improvements are supported by the data rather than artifacts of the few-shot setup or VLM prompting.

Authors: We acknowledge that the experimental section requires more granular reporting to allow independent verification. The revised manuscript will explicitly list all baseline implementations and hyperparameters, report results over multiple random seeds with error bars and standard deviations, include statistical significance tests (paired t-tests with p-values), provide ablation studies that isolate the contribution of each agent, and add a dedicated error-analysis subsection examining failure modes under few-shot conditions. These changes will demonstrate that the reported gains are robust rather than artifacts of the evaluation setup. revision: yes
Referee: [Test-time update strategy] Test-time update strategy: The description of 'cautious, continuous knowledge bank refinement to mitigate few-shot bias and distribution shift' is central to handling the few-shot regime, but no implementation specifics, update rules, or empirical validation of its effect on preventing overfitting or drift across the 12 benchmarks are supplied.

Authors: We recognize that the test-time update strategy needs concrete implementation details and validation. In the revision we will specify the exact update rules (including the confidence threshold, conditions for applying an update, and mechanisms to detect distribution shift), describe memory-management steps that prevent context collapse, and add empirical results showing the strategy's effect (ablations with/without test-time updates plus stability metrics for the knowledge bank across all 12 benchmarks). This will provide the requested validation that the approach mitigates bias and overfitting. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical framework rests on experiments, not self-referential derivations

full rationale

The paper describes an agentic framework (MarsTSC) with three roles—Generator, Reflector, Modifier—and a self-evolving knowledge bank refined via test-time updates. No equations, parameters, or derivations are introduced that reduce by construction to fitted inputs, self-definitions, or self-citations. Performance claims are grounded in experiments across 12 benchmarks and 6 VLM backbones rather than any tautological reduction. The procedural description of reflective refinement and verified updates does not exhibit the patterns of self-definitional loops or fitted-input predictions; the work is self-contained as an empirical proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 4 invented entities

The central claim depends on the effectiveness of newly introduced conceptual components whose performance is asserted via experiments whose details are not provided in the abstract.

axioms (1)

domain assumption Vision-language models can process time series data when suitably represented as images or text.
Required for applying VLMs to the time series classification task.

invented entities (4)

self-evolving knowledge bank no independent evidence
purpose: Dynamic shared context iteratively refined to prevent collapse and mitigate few-shot bias.
Core new component introduced by the framework.
Generator agent no independent evidence
purpose: Performs initial classification via reasoning.
One of the three collaborative roles defined in the framework.
Reflector agent no independent evidence
purpose: Diagnoses reasoning errors and extracts overlooked temporal features.
One of the three collaborative roles defined in the framework.
Modifier agent no independent evidence
purpose: Applies verified updates to the knowledge bank.
One of the three collaborative roles defined in the framework.

pith-pipeline@v0.9.0 · 5544 in / 1571 out tokens · 53660 ms · 2026-05-12T04:36:48.006667+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 5 internal anchors

[1]

Harika Abburi, Tanya Chaudhary, Haider Ilyas, Lakshmi Manne, Deepak Mit- tal, Don Williams, Derek Snaidauf, Edward Bowen, and Balaji Veeramani

work page
[2]

arXiv preprint arXiv:2309.17001 (2023)

A closer look at bearing fault classification approaches. arXiv preprint arXiv:2309.17001 (2023)

work page arXiv 2023
[3]

Yihao Ang, Yifan Bao, Lei Jiang, Jiajie Tao, Anthony KH Tung, Lukasz Szpruch, and Hao Ni. 2025. Structured Agentic Workflows for Financial Time-Series Modeling with LLMs and Reflective Feedback. arXiv preprint arXiv:2508.13915 (2025)

work page arXiv 2025
[4]

Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebas- tian Pineda Arango, Shubham Kapoor, et al. 2024. Chronos: Learning the language of time series. arXiv preprint arXiv:2403.07815 (2024)

work page internal anchor Pith review arXiv 2024
[5]

Anthony Bagnall, Hoang Anh Dau, Jason Lines, Michael Flynn, James Large, Aaron Bostrom, Paul Southam, and Eamonn Keogh. 2018. The UEA multivariate time series classification archive, 2018. arXiv preprint arXiv:1811.00075 (2018)

work page arXiv 2018
[6]

Shelly Bensal, Umar Jamil, Christopher Bryant, Melisa Russak, Kiran Kam- ble, Dmytro Mozolevskyi, Muayad Ali, and Waseem AlShikh. 2025. Reflect, retry, reward: Self-improving llms via reinforcement learning. arXiv preprint arXiv:2505.24726 (2025)

work page arXiv 2025
[7]

Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5–32

work page 2001
[8]

Sukru Selim Calik, Andac Akyuz, Zeynep Hilal Kilimci, and Kerem Colak. 2025. Explainable-AI powered stock price prediction using time series transformers: A Case Study on BIST100. arXiv preprint arXiv:2506.06345 (2025)

work page arXiv 2025
[9]

Ngai Hang Chan. 2004. Time series: applications to finance. John Wiley & Sons

work page 2004
[10]

Ching Chang, Wei-Yao Wang, Wen-Chih Peng, and Tien-Fu Chen. 2025. Llm4ts: Aligning pre-trained llms as data-efficient time-series forecasters. ACM Transactions on Intelligent Systems and Technology 16, 3 (2025), 1–20

work page 2025
[11]

Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, et al . 2024. Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 24185–24198

work page 2024
[12]

Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273–297

work page 1995
[13]

Thomas Cover and Peter Hart. 1967. Nearest neighbor pattern classification. IEEE transactions on information theory 13, 1 (1967), 21–27

work page 1967
[14]

Mayank Daswani, Mathias MJ Bellaiche, Marc Wilson, Desislav Ivanov, Mikhail Papkov, Eva Schnider, Jing Tang, Kay Lamerigts, Gabriela Botea, Michael A Sanchez, et al . 2024. Plots unlock time-series understanding in multimodal models. arXiv preprint arXiv:2410.02637 (2024)

work page arXiv 2024
[15]

Hoang Anh Dau, Anthony Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, and Eamonn Keogh

work page
[16]

IEEE/CAA Journal of Automatica Sinica 6, 6 (2019), 1293–1305

The UCR time series archive. IEEE/CAA Journal of Automatica Sinica 6, 6 (2019), 1293–1305

work page 2019
[17]

Shengdong Du, Tianrui Li, Yan Yang, and Shi-Jinn Horng. 2019. Deep air qual- ity forecasting using hybrid deep learning framework. IEEE Transactions on Knowledge and Data Engineering 33, 6 (2019), 2412–2424

work page 2019
[18]

Mojtaba A Farahani, MR McCormick, Ramy Harik, and Thorsten Wuest. 2025. Time-series classification in smart manufacturing systems: An experimen- tal evaluation of state-of-the-art machine learning algorithms. Robotics and Computer-Integrated Manufacturing 91 (2025), 102839

work page 2025
[19]

Google DeepMind. 2026. Gemini 3.1 Pro Model Card. https://deepmind.google/ models/model-cards/gemini-3-1-pro/

work page 2026
[20]

Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. 2024. MOMENT: A Family of Open Time-series Foundation Models. In International Conference on Machine Learning. PMLR, 16115–16152

work page 2024
[21]

Xinyu Huang, Jun Tang, and Yongming Shen. 2024. Long time series of ocean wave prediction based on PatchTST model. Ocean Engineering 301 (2024), 117572

work page 2024
[22]

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of hallucination in natural language generation. ACM computing surveys 55, 12 (2023), 1–38

work page 2023
[23]

Yushan Jiang, Kanghui Ning, Zijie Pan, Xuyang Shen, Jingchao Ni, Wenchao Yu, Anderson Schneider, Haifeng Chen, Yuriy Nevmyvaka, and Dongjin Song. 2025. Multi-modal time series analysis: A tutorial and survey. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2. 6043–6053

work page 2025
[24]

Yushan Jiang, Wenchao Yu, Geon Lee, Dongjin Song, Kijung Shin, Wei Cheng, Yanchi Liu, and Haifeng Chen. 2025. Timexl: Explainable multi-modal time series prediction with llm-in-the-loop. arXiv preprint arXiv:2503.01013 (2025)

work page arXiv 2025
[25]

Mohammad Ali Labbaf Khaniki, Alireza Golkarieh, Houman Nouri, and Moham- mad Manthouri. 2024. Enhanced fault detection and cause identification using integrated attention mechanism. arXiv preprint arXiv:2408.00033 (2024)

work page arXiv 2024
[26]

Peiwen Li, Xin Wang, Zeyang Zhang, Yuan Meng, Fang Shen, Yue Li, Jialong Wang, Yang Li, and Wenwu Zhu. 2024. Realtcd: Temporal causal discovery from interventional data with large language model. In Proceedings of the 33rd ACM international conference on information and knowledge management. 4669– 4677

work page 2024
[27]

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023. Visual in- struction tuning. Advances in neural information processing systems 36 (2023), 34892–34916

work page 2023
[28]

Xu Liu, Juncheng Liu, Gerald Woo, Taha Aksu, Yuxuan Liang, Roger Zimmer- mann, Chenghao Liu, Silvio Savarese, Caiming Xiong, and Doyen Sahoo. 2024. Moirai-moe: Empowering time series foundation models with sparse mixture of experts. arXiv preprint arXiv:2410.10469 (2024)

work page arXiv 2024
[29]

Matthew Middlehurst, James Large, Michael Flynn, Jason Lines, Aaron Bostrom, and Anthony Bagnall. 2021. HIVE-COTE 2.0: a new meta ensemble for time series classification. Machine Learning 110, 11 (2021), 3211–3243

work page 2021
[30]

Mukaffi Bin Moin, Fatema Tuj Johora Faria, Swarnajit Saha, Busra Kamal Rafa, and Mohammad Shafiul Alam. 2024. Exploring explainable ai techniques for improved interpretability in lung and colon cancer classification. InInternational Conference on Computing and Communication Networks. Springer, 1–11

work page 2024
[31]

Moonshot AI. 2026. Kimi K2.5 Quickstart. https://platform.kimi.ai/docs/guide/ kimi-k2-5-quickstart

work page 2026
[32]

Mohammad Amin Morid, Olivia R Liu Sheng, and Joseph Dunbar. 2023. Time series prediction using deep learning methods in healthcare. ACM Transactions on Management Information Systems 14, 1 (2023), 1–29

work page 2023
[33]

Ozan Baris Mulayim, Pengrui Quan, Liying Han, Xiaomin Ouyang, Dezhi Hong, Mario Bergés, and Mani Srivastava. 2025. Can Time-Series Foundation Models Perform Building Energy Management Tasks? arXiv preprint arXiv:2506.11250 (2025)

work page arXiv 2025
[34]

Jingchao Ni, Ziming Zhao, ChengAo Shen, Hanghang Tong, Dongjin Song, Wei Cheng, Dongsheng Luo, and Haifeng Chen. 2025. Harnessing vision models for time series analysis: A survey. arXiv preprint arXiv:2502.08869 (2025)

work page arXiv 2025
[35]

Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. 2022. A time series is worth 64 words: Long-term forecasting with transformers. arXiv preprint arXiv:2211.14730 (2022)

work page internal anchor Pith review arXiv 2022
[36]

OpenAI. 2026. GPT-5 Model. https://developers.openai.com/api/docs/models/ gpt-5

work page 2026
[37]

OpenAI. 2026. GPT-5.4 mini Model. https://developers.openai.com/api/docs/ models/gpt-5.4-mini

work page 2026
[38]

Jing-Cheng Pang, Pengyuan Wang, Kaiyuan Li, Xiong-Hui Chen, Jiacheng Xu, Zongzhang Zhang, and Yang Yu. 2023. Language model self-improvement by reinforcement learning contemplation. arXiv preprint arXiv:2305.14483 (2023)

work page arXiv 2023
[39]

Ross Quinlan

J. Ross Quinlan. 1986. Induction of decision trees. Machine learning 1, 1 (1986), 81–106

work page 1986
[40]

2026.Qwen3.5-397B-A17B

Qwen Team. 2026.Qwen3.5-397B-A17B. https://huggingface.co/Qwen/Qwen3.5- 397B-A17B

work page 2026
[41]

Lisa Schmors, Dominic Gonschorek, Jan Niklas Böhm, Yongrong Qiu, Na Zhou, Dmitry Kobak, Andreas Tolias, Fabian Sinz, Jacob Reimer, Katrin Franke, et al

work page
[42]

arXiv preprint arXiv:2506.04906 (2025)

TRACE: Contrastive learning for multi-trial time-series data in neuroscience. arXiv preprint arXiv:2506.04906 (2025)

work page arXiv 2025
[43]

ChengAo Shen, Wenchao Yu, Ziming Zhao, Dongjin Song, Wei Cheng, Haifeng Chen, and Jingchao Ni. 2025. Multi-modal view enhanced large vision models for long-term time series forecasting. arXiv preprint arXiv:2505.24003 (2025)

work page arXiv 2025
[44]

Chang Wei Tan, Angus Dempster, Christoph Bergmeir, and Geoffrey I Webb

work page
[45]

Data Mining and Knowledge Discovery 36, 5 (2022), 1623–1646

MultiRocket: multiple pooling operators and transformations for fast and effective time series classification: CW Tan. Data Mining and Knowledge Discovery 36, 5 (2022), 1623–1646

work page 2022
[46]

Mingtian Tan, Mike Merrill, Vinayak Gupta, Tim Althoff, and Tom Hartvigsen

work page
[47]

Are language models actually useful for time series forecasting? Advances in Neural Information Processing Systems 37 (2024), 60162–60191

work page 2024
[48]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)

work page 2017
[49]

Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al . 2016. Matching networks for one shot learning. Advances in neural information processing systems 29 (2016)

work page 2016
[50]

Jiahao Wang, Mingyue Cheng, Qingyang Mao, Yitong Zhou, Daoyu Wang, Qi Liu, Feiyang Xu, and Xin Li. 2025. Tabletime: Reformulating time series classification as training-free table understanding with large language models. In Proceedings of the 34th ACM International Conference on Information and Knowledge Management. 3009–3019

work page 2025
[51]

Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. [n. d.]. TimesNet: Temporal 2D-Variation Modeling for General Time Series Anal- ysis. In The Eleventh International Conference on Learning Representations

work page
[52]

Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. 2021. Autoformer: De- composition transformers with auto-correlation for long-term series forecasting. Advances in neural information processing systems 34 (2021), 22419–22430

work page 2021
[53]

Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang

work page
[54]

A-MEM: Agentic Memory for LLM Agents

A-mem: Agentic memory for llm agents. arXiv preprint arXiv:2502.12110 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[55]

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 technical 10 Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning Preprint, 2026, report. arXiv preprint arXiv:2505.09388 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[56]

Jinning Yang and Wen Shi. 2025. DiagECG: An LLM-Driven Framework for Diagnostic Reasoning via Discretized ECG Tokenization. arXiv preprint arXiv:2508.15338 (2025)

work page arXiv 2025
[57]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. In The eleventh international conference on learning representations

work page 2022
[58]

Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014)

work page arXiv 2014
[59]

Qizheng Zhang, Changran Hu, Shubhangi Upasani, Boyuan Ma, Fenglu Hong, Vamsidhar Kamanuru, Jay Rainton, Chen Wu, Mengmeng Ji, Hanchen Li, et al

work page
[60]

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Agentic context engineering: Evolving contexts for self-improving language models. arXiv preprint arXiv:2510.04618 (2025)

work page internal anchor Pith review arXiv 2025
[61]

Haokun Zhao, Xiang Zhang, Jiaqi Wei, Yiwei Xu, Yuting He, Siqi Sun, and Chenyu You. 2025. Timeseriesscientist: A general-purpose ai agent for time series analysis. arXiv preprint arXiv:2510.01538 (2025)

work page arXiv 2025
[62]

Ziming Zhao, ChengAo Shen, Hanghang Tong, Dongjin Song, Zhigang Deng, Qingsong Wen, and Jingchao Ni. 2025. From Images to Signals: Are Large Vision Models Useful for Time Series Analysis? arXiv preprint arXiv:2505.24030 (2025)

work page arXiv 2025
[63]

Siru Zhong, Weilin Ruan, Ming Jin, Huan Li, Qingsong Wen, and Yuxuan Liang

work page
[64]

arXiv preprint arXiv:2502.04395 , year=

Time-vlm: Exploring multimodal vision-language models for augmented time series forecasting. arXiv preprint arXiv:2502.04395 (2025)

work page arXiv 2025
[65]

Shu Zhou, Yunyang Xuan, Yuxuan Ao, Xin Wang, Tao Fan, and Hao Wang. 2025. MERIT: Multi-Agent Collaboration for Unsupervised Time Series Representation Learning. In Findings of the Association for Computational Linguistics: ACL

work page 2025
[66]

Tian Zhou, Peisong Niu, Liang Sun, Rong Jin, et al . 2023. One fits all: Power general time series analysis by pretrained lm. Advances in neural information processing systems 36 (2023), 43322–43355

work page 2023
[67]

Jiaxin Zhuang, Leon Yan, Zhenwei Zhang, Ruiqi Wang, Jiawei Zhang, and Yuantao Gu. 2024. See it, think it, sorted: Large multimodal models are few-shot time series anomaly analyzers. arXiv preprint arXiv:2411.02465 (2024)

work page arXiv 2024
[68]

Yufan Zhuang, Chandan Singh, Liyuan Liu, Yelong Shen, Dinghuai Zhang, Jingbo Shang, Jianfeng Gao, and Weizhu Chen. 2026. Test-time Recursive Thinking: Self-Improvement without External Feedback. arXiv preprint arXiv:2602.03094 (2026). 11 Preprint, 2026, Li et al. A Appendix Overview This appendix is organized as follows. We first describe additional imple...

work page arXiv 2026
[69]

- section: the section to add the new bullet to

ADD: Create new bullet points with fresh IDs. - section: the section to add the new bullet to. - content: the new content of the bullet

work page
[70]

- target_id: the exact ID of the bullet to modify

MODIFY: Update an existing bullet point. - target_id: the exact ID of the bullet to modify. - content: the fully updated content of the bullet

work page
[71]

- target_id: index of the bullet point you want to remove

DELETE: Remove an existing bullet point. - target_id: index of the bullet point you want to remove. ### Query Sample: Below is the Query Image to be classified by the predictor.: { visualized time series } C Extended Experiments C.1 Analysis of Few-shot Train Samples To further evaluate the robustness ofMarsTSCunder few-shot sampling randomness and varyin...

work page arXiv 2026