TS-Reasoner: Domain-Oriented Time Series Inference Agents for Reasoning and Automated Analysis
Pith reviewed 2026-05-23 19:43 UTC · model grok-4.3
The pith
TS-Reasoner integrates LLM reasoning with time-series tools and feedback loops to outperform general models on multi-step inference tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TS-Reasoner is a domain-specialized agent that integrates LLM reasoning with domain-specific computational tools and an error feedback loop, enabling domain-informed, constraint-aware analytical workflows that combine symbolic reasoning with precise numerical analysis. Experiments on TimeSeriesExam and a new multi-step inference dataset show that this approach outperforms standalone general-purpose LLMs in both fundamental time series concept understanding and complex inference tasks.
What carries the argument
TS-Reasoner agent that fuses LLM reasoning with domain-specific computational tools and an error feedback loop.
If this is right
- The agent achieves higher accuracy on basic time series concept questions than general LLMs.
- It completes multi-step inference tasks that require both compositional logic and exact numerical computation more reliably.
- The resulting workflows stay within domain constraints while mixing symbolic steps and numerical evaluation.
- The design supports automated real-world time series reasoning without manual intervention at each step.
Where Pith is reading between the lines
- Similar tool-plus-feedback structures could be applied to other data modalities that mix language and precise calculation.
- The approach suggests a template for agents in scientific domains where general models currently fall short on numerical fidelity.
- Performance gains may depend on how well the feedback loop identifies and corrects specific classes of numerical or logical errors.
Load-bearing premise
Combining language-model reasoning with domain tools and feedback loops produces genuinely better constraint-aware workflows than general models alone.
What would settle it
An experiment in which TS-Reasoner achieves equal or lower accuracy than a general LLM on the same multi-step time series tasks and datasets.
Figures
read the original abstract
Time series analysis is crucial in real-world applications, yet traditional methods focus on isolated tasks only, and recent studies on time series reasoning remain limited to either single-step inference or are constrained to natural language answers. In this work, we introduce TS-Reasoner, a domain-specialized agent designed for multi-step time series inference. By integrating large language model (LLM) reasoning with domain-specific computational tools and an error feedback loop, TS-Reasoner enables domain-informed, constraint-aware analytical workflows that combine symbolic reasoning with precise numerical analysis. We assess the system's capabilities along two axes: (1) fundamental time series understanding assessed by TimeSeriesExam and (2) complex, multi-step inference evaluated by a newly proposed dataset designed to test both compositional reasoning and computational precision in time series analysis. Experiments show that our approach outperforms standalone general-purpose LLMs in both basic time series concept understanding as well as the multi-step time series inference task, highlighting the promise of domain-specialized agents for automating real-world time series reasoning and analysis.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces TS-Reasoner, a domain-specialized agent that integrates LLM reasoning with domain-specific computational tools and an error feedback loop to enable multi-step time series inference. It evaluates performance on TimeSeriesExam for basic concept understanding and on a newly introduced dataset for compositional multi-step reasoning, claiming consistent outperformance over standalone general-purpose LLMs.
Significance. If the performance gains can be attributed to the agent architecture rather than tool access alone, the work would provide evidence that hybrid LLM-tool systems with feedback can improve automated analysis on tasks requiring both symbolic and numerical precision, with potential implications for domain-specific agent design in scientific ML.
major comments (3)
- [Experiments] Experiments section (and abstract): the baselines are described only as 'standalone general-purpose LLMs' with no indication that they receive access to the same domain-specific computational tools used by TS-Reasoner. Because the central claim attributes gains to the combination of LLM reasoning, tools, and error feedback loop, the absence of tool-augmented LLM baselines (or component ablations) means the results do not isolate whether the reported improvements require the agent scaffolding.
- [§4] §4 (evaluation on new dataset): the manuscript provides insufficient detail on dataset construction, task distribution, and metrics for compositional reasoning versus computational precision, making it difficult to verify that the new benchmark genuinely stresses multi-step inference beyond what single-step tool use would achieve.
- [Results tables] Table 1 / results tables: no statistical significance tests, confidence intervals, or variance across runs are reported for the claimed outperformance, which is load-bearing for the assertion that the approach 'outperforms' on both axes.
minor comments (2)
- [§3] Notation for the error feedback loop and tool interfaces is introduced without a clear diagram or pseudocode, reducing reproducibility.
- [Abstract] The abstract states outperformance but the full experimental design details (prompt templates, tool definitions, number of trials) appear only later; moving a concise summary of controls to the abstract would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We agree that additional baselines, expanded dataset details, and statistical reporting are needed to strengthen the claims, and we will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Experiments] Experiments section (and abstract): the baselines are described only as 'standalone general-purpose LLMs' with no indication that they receive access to the same domain-specific computational tools used by TS-Reasoner. Because the central claim attributes gains to the combination of LLM reasoning, tools, and error feedback loop, the absence of tool-augmented LLM baselines (or component ablations) means the results do not isolate whether the reported improvements require the agent scaffolding.
Authors: We agree that the current evaluation does not fully isolate the contribution of the agent scaffolding from tool access alone. In the revision we will add tool-augmented LLM baselines (general-purpose LLMs given the same computational tools but without the multi-step agent loop or error feedback) as well as component ablations. These new results will be reported in the experiments section and referenced in the abstract. revision: yes
-
Referee: [§4] §4 (evaluation on new dataset): the manuscript provides insufficient detail on dataset construction, task distribution, and metrics for compositional reasoning versus computational precision, making it difficult to verify that the new benchmark genuinely stresses multi-step inference beyond what single-step tool use would achieve.
Authors: We will substantially expand §4 to include the dataset construction methodology, the breakdown of task types (compositional reasoning vs. computational precision), and the precise metrics used for each axis. This will clarify how the benchmark evaluates multi-step inference beyond single-step tool calls. revision: yes
-
Referee: [Results tables] Table 1 / results tables: no statistical significance tests, confidence intervals, or variance across runs are reported for the claimed outperformance, which is load-bearing for the assertion that the approach 'outperforms' on both axes.
Authors: We acknowledge the omission. We will rerun the key experiments across multiple random seeds, compute confidence intervals and standard deviations, and add statistical significance tests (e.g., paired t-tests) to all reported performance differences in the revised tables. revision: yes
Circularity Check
No circularity: empirical claims rest on independent benchmarks
full rationale
The paper introduces an agent architecture (LLM + domain tools + error feedback) and evaluates it on TimeSeriesExam plus a new compositional dataset. No mathematical derivation chain, fitted parameters renamed as predictions, or self-citation load-bearing steps appear in the provided abstract or description. The central claim is experimental outperformance over standalone LLMs; the benchmarks are described as external and independent, with no indication that results reduce to the inputs by construction. This is a standard system paper whose validity hinges on experimental controls rather than definitional equivalence.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Domain-specific tools can be effectively integrated with LLMs for precise numerical analysis in time series tasks
Forward citations
Cited by 4 Pith papers
-
TimeClaw: A Time-Series AI Agent with Exploratory Execution Learning
TimeClaw is an exploratory execution learning system that turns multiple valid tool-use paths into hierarchical distilled experience for improved time-series reasoning without test-time adaptation.
-
TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale
TimeSeriesExamAgent combines templates and LLM agents to generate scalable time series reasoning benchmarks, demonstrating that current LLMs have limited performance on both abstract and domain-specific tasks.
-
TS-Agent: Understanding and Reasoning Over Raw Time Series via Iterative Insight Gathering
TS-Agent is an agentic framework that uses LLMs only for evidence-based reasoning while delegating extraction to raw time series tools, matching or exceeding baselines on four benchmarks with largest gains on reasoning tasks.
-
TimeMM: Time-as-Operator Spectral Filtering for Dynamic Multimodal Recommendation
TimeMM proposes a time-as-operator spectral filtering framework with adaptive mixing and modality routing to model non-stationary multimodal user preferences in recommendation systems.
Reference graph
Works this paper leans on
-
[1]
Yifang Cheng, Zachary Ross, Egill Hauksson, and Yehuda Ben-Zion. A refined comprehen- sive earthquake focal mechanism catalog for southern california derived with deep learning algorithms. In AGU Fall Meeting Abstracts, volume 2021, pages S32A–05, 2021
work page 2021
-
[2]
Identifying coordinated accounts on social media through hidden influence and group behaviours
Karishma Sharma, Yizhou Zhang, Emilio Ferrara, and Yan Liu. Identifying coordinated accounts on social media through hidden influence and group behaviours. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 1441–1451, 2021
work page 2021
-
[3]
Vigdet: Knowledge informed neural temporal point process for coordination detection on social media
Yizhou Zhang, Karishma Sharma, and Yan Liu. Vigdet: Knowledge informed neural temporal point process for coordination detection on social media. Advances in Neural Information Processing Systems, 34:3218–3231, 2021
work page 2021
-
[4]
James D Hamilton. Time series analysis. Princeton university press, 2020
work page 2020
-
[5]
Time-series forecasting with deep learning: a survey
Bryan Lim and Stefan Zohren. Time-series forecasting with deep learning: a survey. Philosoph- ical Transactions of the Royal Society A, 379(2194):20200209, 2021
work page 2021
-
[6]
Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, and Pierre- Alain Muller. Deep learning for time series classification: a review.Data mining and knowledge discovery, 33(4):917–963, 2019
work page 2019
-
[7]
Deep learning for time series anomaly detection: A survey
Zahra Zamanzadeh Darban, Geoffrey I Webb, Shirui Pan, Charu Aggarwal, and Mahsa Salehi. Deep learning for time series anomaly detection: A survey. ACM Computing Surveys, 57(1):1– 42, 2024
work page 2024
-
[8]
Chronos: Learning the Language of Time Series
Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, et al. Chronos: Learning the language of time series. arXiv preprint arXiv:2403.07815, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[9]
Moirai-moe: Empowering time series foundation models with sparse mixture of experts
Xu Liu, Juncheng Liu, Gerald Woo, Taha Aksu, Yuxuan Liang, Roger Zimmermann, Chenghao Liu, Silvio Savarese, Caiming Xiong, and Doyen Sahoo. Moirai-moe: Empowering time series foundation models with sparse mixture of experts. arXiv preprint arXiv:2410.10469, 2024
-
[10]
Azul Garza and Max Mergenthaler-Canseco. Timegpt-1. arXiv preprint arXiv:2310.03589, 2023
-
[11]
Moment: A family of open time-series foundation models
Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. Moment: A family of open time-series foundation models. arXiv preprint arXiv:2402.03885, 2024
-
[12]
Informer: Beyond efficient transformer for long sequence time-series forecasting
Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of AAAI, 2021
work page 2021
-
[13]
220kv city power grid maximum loadability determination with static security-constraints
Ke-qiu W ANG, Si-guang SUN, Hong-yi W ANG, Chang-xu JIANG, and Zhao-xia JING. 220kv city power grid maximum loadability determination with static security-constraints. Power, Energy Engineering and Management (PEEM2016), page 1, 2016
work page 2016
-
[14]
Beyond accuracy: Evaluating the reasoning behavior of large language models–a survey
Philipp Mondorf and Barbara Plank. Beyond accuracy: Evaluating the reasoning behavior of large language models–a survey. arXiv preprint arXiv:2404.01869, 2024
-
[15]
Evaluating large language models at evaluating instruction following
Zhiyuan Zeng, Jiatong Yu, Tianyu Gao, Yu Meng, Tanya Goyal, and Danqi Chen. Evaluating large language models at evaluating instruction following. arXiv preprint arXiv:2310.07641, 2023
-
[16]
Dimitris Spathis and Fahim Kawsar. The first step is the hardest: Pitfalls of representing and tokenizing temporal data for large language models. Journal of the American Medical Informatics Association, 31(9):2151–2158, 2024
work page 2024
-
[17]
Mechanics of next token prediction with self-attention
Yingcong Li, Yixiao Huang, Muhammed E Ildiz, Ankit Singh Rawat, and Samet Oymak. Mechanics of next token prediction with self-attention. In International Conference on Artificial Intelligence and Statistics, pages 685–693. PMLR, 2024. 10
work page 2024
-
[18]
The future is different: Large pre- trained language models fail in prediction tasks
Kostadin Cvejoski, Ramsés J Sánchez, and César Ojeda. The future is different: Large pre- trained language models fail in prediction tasks. arXiv preprint arXiv:2211.00384, 2022
-
[19]
Why large language models fail at precision regression, 2025
Karthick Panner Selvam. Why large language models fail at precision regression, 2025
work page 2025
-
[20]
Domain specialization as the key to make large language models disruptive: A comprehensive survey
Chen Ling, Xujiang Zhao, Jiaying Lu, Chengyuan Deng, Can Zheng, Junxiang Wang, Tanmoy Chowdhury, Yun Li, Hejie Cui, Xuchao Zhang, et al. Domain specialization as the key to make large language models disruptive: A comprehensive survey. arXiv preprint arXiv:2305.18703, 2023
-
[21]
Language models are few-shot learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020
work page 1901
-
[22]
Timeseriesexam: A time series understanding exam
Yifu Cai, Arjun Choudhry, Mononito Goswami, and Artur Dubrawski. Timeseriesexam: A time series understanding exam. arXiv preprint arXiv:2410.14752, 2024
-
[23]
Kyo Beom Han, Jaesung Jung, and Byung O Kang. Real-time load variability control using energy storage system for demand-side management in south korea. Energies, 14(19):6292, 2021
work page 2021
-
[24]
Short-term scheduling of electric power systems under minimum load conditions
Claudia Greif, Raymond B Johnson, Chao an Li, Alva J Svoboda, and K Andrijeski Uemura. Short-term scheduling of electric power systems under minimum load conditions. IEEE transactions on power systems, 14(1):280–286, 1999
work page 1999
-
[25]
Learning semantic context from normal samples for unsupervised anomaly detection
Xudong Yan, Huaidong Zhang, Xuemiao Xu, Xiaowei Hu, and Pheng-Ann Heng. Learning semantic context from normal samples for unsupervised anomaly detection. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 3110–3118, 2021
work page 2021
-
[26]
Parkca: Causal inference with partially known causes
Raquel Aoki and Martin Ester. Parkca: Causal inference with partially known causes. In BIO- COMPUTING 2021: Proceedings of the Pacific Symposium, pages 196–207. World Scientific, 2020
work page 2021
-
[27]
Tsmixer: Lightweight mlp-mixer model for multivariate time series forecasting
Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. Tsmixer: Lightweight mlp-mixer model for multivariate time series forecasting. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23, page 459–469, New York, NY , USA, 2023. Association for Computing Machinery
work page 2023
-
[28]
Csdi: Conditional score-based diffusion models for probabilistic time series imputation
Yusuke Tashiro, Jiaming Song, Yang Song, and Stefano Ermon. Csdi: Conditional score-based diffusion models for probabilistic time series imputation. Advances in Neural Information Processing Systems, 34:24804–24816, 2021
work page 2021
-
[29]
Convolutional neural networks for time series classification
Bendong Zhao, Huanzhang Lu, Shangfeng Chen, Junliang Liu, and Dongya Wu. Convolutional neural networks for time series classification. Journal of Systems Engineering and Electronics, 28(1):162–169, 2017
work page 2017
-
[30]
Anomaly transformer: Time series anomaly detection with association discrepancy
Jiehui Xu, Haixu Wu, Jianmin Wang, and Mingsheng Long. Anomaly transformer: Time series anomaly detection with association discrepancy. arXiv preprint arXiv:2110.02642, 2021
-
[31]
Unified training of universal time series forecasting transformers
Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. Unified training of universal time series forecasting transformers. In Forty-first International Conference on Machine Learning, 2024
work page 2024
-
[32]
Large language models are zero-shot time series forecasters
Nate Gruver, Marc Finzi, Shikai Qiu, and Andrew G Wilson. Large language models are zero-shot time series forecasters. Advances in Neural Information Processing Systems , 36, 2024
work page 2024
-
[33]
Time-LLM: Time Series Forecasting by Reprogramming Large Language Models
Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, et al. Time-llm: Time series forecasting by reprogramming large language models. arXiv preprint arXiv:2310.01728, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[34]
Time- moe: Billion-scale time series foundation models with mixture of experts
Xiaoming Shi, Shiyu Wang, Yuqi Nie, Dianqi Li, Zhou Ye, Qingsong Wen, and Ming Jin. Time- moe: Billion-scale time series foundation models with mixture of experts. In The Twenty-First International Conference on Learning Representations, 2025. 11
work page 2025
-
[35]
O., Pfister, T., Zheng, Y., Ye, W., and Liu, Y
Defu Cao, Furong Jia, Sercan O Arik, Tomas Pfister, Yixiang Zheng, Wen Ye, and Yan Liu. Tempo: Prompt-based generative pre-trained transformer for time series forecasting. arXiv preprint arXiv:2310.04948, 2023
-
[36]
Towards Reasoning in Large Language Models: A Survey
Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[37]
Reasoning with language model prompting: A survey
Shuofei Qiao, Yixin Ou, Ningyu Zhang, Xiang Chen, Yunzhi Yao, Shumin Deng, Chuanqi Tan, Fei Huang, and Huajun Chen. Reasoning with language model prompting: A survey. arXiv preprint arXiv:2212.09597, 2022
-
[38]
Large language models for mathematical reasoning: Progresses and challenges
Janice Ahn, Rishu Verma, Renze Lou, Di Liu, Rui Zhang, and Wenpeng Yin. Large language models for mathematical reasoning: Progresses and challenges. arXiv preprint arXiv:2402.00157, 2024
-
[39]
Chain-of-thought prompting elicits reasoning in large language models
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022
work page 2022
-
[40]
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[41]
Beyond chain-of-thought, effective graph-of-thought reasoning in language models
Yao Yao, Zuchao Li, and Hai Zhao. Beyond chain-of-thought, effective graph-of-thought reasoning in language models. arXiv preprint arXiv:2305.16582, 2023
-
[42]
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[43]
Recursive introspection: Teaching language model agents how to self-improve
Yuxiao Qu, Tianjun Zhang, Naman Garg, and Aviral Kumar. Recursive introspection: Teaching language model agents how to self-improve. arXiv preprint arXiv:2407.18219, 2024
-
[44]
Decomposed Prompting: A Modular Approach for Solving Complex Tasks
Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, and Ashish Sabharwal. Decomposed prompting: A modular approach for solving complex tasks. arXiv preprint arXiv:2210.02406, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[45]
Faithful reasoning using large language models
Antonia Creswell and Murray Shanahan. Faithful reasoning using large language models. arXiv preprint arXiv:2208.14271, 2022
-
[46]
Pal: Program-aided language models
Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. Pal: Program-aided language models. In International Conference on Machine Learning, pages 10764–10799. PMLR, 2023
work page 2023
-
[47]
Visual programming: Compositional visual reasoning without training
Tanmay Gupta and Aniruddha Kembhavi. Visual programming: Compositional visual reasoning without training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14953–14962, 2023
work page 2023
-
[48]
SWE-agent: Agent-computer interfaces enable automated soft- ware engineering
John Yang, Carlos E Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik R Narasimhan, and Ofir Press. SWE-agent: Agent-computer interfaces enable automated soft- ware engineering. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
work page 2024
-
[49]
Honeycomb: A flexible llm-based agent system for materials science
Huan Zhang, Yu Song, Ziyu Hou, Santiago Miret, and Bang Liu. Honeycomb: A flexible llm-based agent system for materials science. arXiv preprint arXiv:2409.00135, 2024
-
[50]
Crispr-gpt: An llm agent for automated design of gene-editing experiments
Kaixuan Huang, Yuanhao Qu, Henry Cousins, William A Johnson, Di Yin, Mihir Shah, Denny Zhou, Russ Altman, Mengdi Wang, and Le Cong. Crispr-gpt: An llm agent for automated design of gene-editing experiments. arXiv preprint arXiv:2404.18021, 2024
-
[51]
Agentic feedback loop modeling improves recommendation and user simulation
Shihao Cai, Jizhi Zhang, Keqin Bao, Chongming Gao, Qifan Wang, Fuli Feng, and Xiangnan He. Agentic feedback loop modeling improves recommendation and user simulation. InProceedings of the 48th International ACM SIGIR conference on Research and Development in Information Retrieval, 2025. 12
work page 2025
-
[52]
Adaplanner: Adaptive planning from feedback with language models
Haotian Sun, Yuchen Zhuang, Lingkai Kong, Bo Dai, and Chao Zhang. Adaplanner: Adaptive planning from feedback with language models. Advances in neural information processing systems, 36:58202–58245, 2023
work page 2023
-
[53]
Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card. arXiv preprint arXiv:2410.21276, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[54]
Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, et al. Openai o1 system card. arXiv preprint arXiv:2412.16720, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[55]
Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[56]
Executable code actions elicit better llm agents
Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, and Heng Ji. Executable code actions elicit better llm agents. In Forty-first International Conference on Machine Learning, 2024
work page 2024
-
[57]
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[58]
AgentScope: A Flexible yet Robust Multi-Agent Platform,
Dawei Gao, Zitao Li, Xuchen Pan, Weirui Kuang, Zhijian Ma, Bingchen Qian, Fei Wei, Wenhao Zhang, Yuexiang Xie, Daoyuan Chen, Liuyi Yao, Hongyi Peng, Ze Yu Zhang, Lin Zhu, Chen Cheng, Hongzhu Shi, Yaliang Li, Bolin Ding, and Jingren Zhou. Agentscope: A flexible yet robust multi-agent platform. CoRR, abs/2402.14034, 2024
-
[59]
Application of Deep Convolutional Neural Networks for Detecting Extreme Weather in Climate Datasets
Yunjie Liu, Evan Racah, Joaquin Correa, Amir Khosrowshahi, David Lavers, Kenneth Kunkel, Michael Wehner, William Collins, et al. Application of deep convolutional neural networks for detecting extreme weather in climate datasets. arXiv preprint arXiv:1605.01156, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[60]
Isaac Kofi Nti, Moses Teimeh, Owusu Nyarko-Boateng, and Adebayo Felix Adekoya. Electricity load forecasting: a systematic review.Journal of Electrical Systems and Information Technology, 7:1–19, 2020
work page 2020
-
[61]
evaluator received input of NoneType
Xiangtian Zheng, Nan Xu, Loc Trinh, Dongqi Wu, Tong Huang, S Sivaranjani, Yan Liu, and Le Xie. Psml: a multi-scale time-series dataset for machine learning in decarbonized energy grids. arXiv preprint arXiv:2110.06324, 2021. 13 A Dataset Compilation Dataset Compilation Since complex time series reasoning remains largely underexplored, we construct a compl...
-
[62]
I require that the system load is maintained above a minimum of {load value} MW. 3. I must monitor the load ramp rate to ensure it does not exceed {constraint value} MW for each time step. 4. I need to manage the load variability so that it does not exceed {constraint value} MW over the given period.] Think about how {influence variables} influence {targe...
-
[63]
data correlation: the multi variable should be correlated, sample: which A first influence B, then B have influence on C or D, there should be some time delay, as the influence on other staff needs time
-
[64]
data trend: there should be some trend in the data, like the data is increasing or decreasing
-
[65]
data: seasonality there should be some seasonality in the data, like the data is periodic
-
[66]
data noise: the noise should be added to the data, as the real world data is not perfect
-
[67]
data background: the data should have some real world background, you should first think about different real world data, and provide a description for the variable and time series data, then generate the data using the code. CoT Sample: Q: Approximate Relation Ratio: 0.5 Relation Matrix: A B C D A 1 1 0 1 B 0 1 0 1 C 0 1 1 1 D 0 0 0 1 • A influences B an...
-
[68]
V AL" and the expected anomaly rate is to be stored in the variable
Advertising (A): The level of advertising spend directly impacts the sales of each store. After a delay, this starts influencing sales. 2. Sales (B): The sales numbers for each store are influenced by both the advertising and local seasonal events. 3. Economic Factors (C): Broader economic trends, like GDP growth or unemployment rates, also impact sales. ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.