Harmonizing Real-Time Constraints and Long-Horizon Reasoning: An Asynchronous Agentic Framework for Dynamic Scheduling

Jing Liu; Shijie Cao; Yuan Yuan

arxiv: 2605.29262 · v1 · pith:XNEE6K3Gnew · submitted 2026-05-28 · 💻 cs.AI

Harmonizing Real-Time Constraints and Long-Horizon Reasoning: An Asynchronous Agentic Framework for Dynamic Scheduling

Shijie Cao , Yuan Yuan , Jing Liu This is my paper

Pith reviewed 2026-06-29 07:36 UTC · model grok-4.3

classification 💻 cs.AI

keywords dynamic flexible job shop schedulingLLM agentsasynchronous frameworkreal-time constraintsrule evolutionscheduling heuristicsdual-stream architecture

0 comments

The pith

RACE-Sched separates real-time rule execution from LLM reasoning to meet millisecond constraints while evolving better scheduling policies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an asynchronous framework for dynamic flexible job shop scheduling that runs two streams in parallel. A reactive stream applies fast symbolic heuristics to dispatch jobs instantly when disruptions occur. A deliberative stream uses an LLM to create candidate rules, tests them in a sandbox, and deploys only validated improvements through atomic updates that do not interrupt the live loop. A semantic repository stores successful rules for quick reuse on new problem instances. This structure targets the tension between needing immediate responses in industrial control and wanting long-term optimization that adapts to changing conditions.

Core claim

RACE-Sched is an asynchronous agent-based framework for the Dynamic Flexible Job Shop Scheduling Problem that decouples policy execution from logical reasoning via a dual-stream architecture. The Reactive Stream executes low-latency symbolic heuristics for real-time dispatching. The parallel Deliberative Stream leverages an LLM to synthesize, validate, and evolve these rules, with candidate rules undergoing rigorous testing in a sandbox before deployment via atomic updates. A semantic rule repository indexes validated heuristics for retrieval-based initialization to enhance transferability across problem scales.

What carries the argument

The dual-stream architecture, where the Reactive Stream handles immediate low-latency symbolic dispatching while the Deliberative Stream uses an LLM for sandbox-validated rule synthesis and atomic deployment, plus a semantic rule repository for cross-scale reuse.

If this is right

RACE-Sched achieves higher solution quality than leading deep reinforcement learning methods and other LLM-based approaches on GEN-Bench, MK-Bench, and JMS-Bench.
The framework maintains millisecond-level decision cycles while incorporating long-horizon reasoning.
The semantic rule repository enables improved transfer of validated heuristics to problem instances of different scales.
Atomic updates after sandbox validation allow rule evolution without blocking the real-time control loop.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar dual-stream separation could be applied to other latency-sensitive control tasks such as traffic signal management or robotic motion planning where reasoning must not delay physical actions.
The sandbox validation step could be extended with formal verification methods to further reduce the risk of unsafe rule deployment.
The approach might support incremental addition of new data sources into the deliberative stream without retraining an entire model.

Load-bearing premise

LLM-generated candidate rules can be tested and validated in a sandbox such that their deployment via atomic updates is guaranteed to preserve safety and improve performance in the live control loop.

What would settle it

A documented case in which an LLM-proposed rule passes all sandbox tests yet produces either a safety violation or lower overall performance once atomically inserted into the running reactive stream on any of the three benchmarks.

Figures

Figures reproduced from arXiv: 2605.29262 by Jing Liu, Shijie Cao, Yuan Yuan.

**Figure 1.** Figure 1: Overview of RACE-Sched. The Reactive Stream executes an active symbolic rule for real-time scheduling, while the Deliberative [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Rolling throughput after a machine failure. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

read the original abstract

The Dynamic Flexible Job Shop Scheduling Problem (DFJSP) necessitates a trade-off between instant reaction to stochastic disturbances and global optimization of production goals. Conventional priority rules are insufficiently flexible to handle complex disruptions, whereas learning-based approaches often compromise interpretability or fail to generalize across problem scales. Although Large Language Models (LLMs) offer advanced reasoning capabilities to bridge this gap, their substantial inference latency is incompatible with the millisecond-level decision cycles of industrial control systems. To resolve this conflict, we introduce RACE-Sched, an asynchronous agent-based framework that decouples policy execution from logical reasoning via a dual-stream architecture. The Reactive Stream executes low-latency symbolic heuristics to enable real-time dispatching, while the parallel Deliberative Stream leverages an LLM to synthesize, validate, and evolve these rules. Candidate rules undergo rigorous testing in a sandbox and are deployed via atomic updates, ensuring safety without blocking the control loop. Additionally, a semantic rule repository indexes validated heuristics for retrieval-based initialization which enhances transferability across problem scales. Extensive evaluations on GEN-Bench, MK-Bench, and JMS-Bench demonstrate that RACE-Sched outperforms leading Deep Reinforcement Learning and other LLM-based baselines. This approach harmonizes real-time constraints with long-horizon reasoning to achieve superior solution quality and robust adaptation to dynamic events.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RACE-Sched's dual-stream split is a reasonable way to keep millisecond dispatching while letting an LLM evolve rules, but the abstract gives no data, protocol, or metrics on the sandbox step or the claimed benchmark wins.

read the letter

The paper's core idea is an asynchronous dual-stream agent: a reactive stream runs fast symbolic heuristics for real-time DFJSP dispatching, while a parallel deliberative stream uses an LLM to generate candidate rules, tests them in a sandbox, and pushes validated ones via atomic updates. A semantic repository helps reuse across scales. This specific combination of elements is new for the problem.

It does a clear job naming the practical tension between instant reaction to disturbances and longer-horizon optimization, and why neither standard priority rules nor current DRL or LLM approaches fully solve it.

The soft spots are the ones the stress-test note flags. The abstract asserts that sandbox testing guarantees safety and improvement before deployment, and that RACE-Sched beats DRL and other LLM baselines on GEN-Bench, MK-Bench, and JMS-Bench, but supplies no protocol details, rejection criteria, disturbance coverage, empirical rates, latency measurements, or any quantitative results. Without those, the safety guarantee and the outperformance claim cannot be checked. The assumption that LLM rules can be reliably validated this way is load-bearing and unshown here.

The work is aimed at people building hybrid AI-control systems for manufacturing or logistics. A reader already thinking about real-time constraints plus reasoning agents could extract the architecture sketch even if the results section is still needed.

It deserves peer review because the problem is real and the high-level design is coherent, though any review would have to focus on whether the sandbox and experiments actually deliver what the abstract promises.

Referee Report

2 major / 2 minor

Summary. The paper introduces RACE-Sched, an asynchronous agentic framework for the Dynamic Flexible Job Shop Scheduling Problem (DFJSP). It decouples real-time policy execution from long-horizon reasoning via a dual-stream architecture: a Reactive Stream runs low-latency symbolic heuristics for dispatching, while a parallel Deliberative Stream uses an LLM to synthesize candidate rules, validate them in a sandbox, and deploy validated rules via atomic updates. A semantic rule repository supports retrieval-based initialization for cross-scale transfer. The central claim is that this design outperforms leading DRL and LLM-based baselines on GEN-Bench, MK-Bench, and JMS-Bench while preserving millisecond-level real-time constraints.

Significance. If the sandbox validation protocol and empirical results hold, the work would offer a concrete mechanism for safely injecting LLM-derived heuristics into hard real-time control loops, addressing a recognized tension between reasoning depth and latency in industrial scheduling. The dual-stream separation and atomic-update mechanism are load-bearing innovations; the semantic repository could improve generalization. Credit is due for framing the problem around verifiable safety predicates rather than post-hoc explanation.

major comments (2)

[§4.2] §4.2 (Sandbox Validation Protocol): The description supplies no concrete safety predicates, coverage metrics over DFJSP disturbance distributions, rejection criteria, or empirical rejection rates for LLM-generated rules. Without these, the guarantee that sandbox-certified rules can be atomically inserted without violating real-time safety or degrading performance is not established, directly undermining the dual-stream separation claim.
[§5] §5 (Experimental Results): The outperformance statements on GEN-Bench, MK-Bench, and JMS-Bench are presented without reported statistical tests, variance across runs, or explicit latency-measurement methodology (wall-clock vs. simulated). These omissions make it impossible to assess whether the reactive stream truly meets millisecond constraints under the reported disturbance regimes.

minor comments (2)

[§3.3] Notation for the rule repository indexing and retrieval is introduced without a formal definition or pseudocode, making the transferability mechanism difficult to reproduce from the text alone.
[Figure 2] Figure 2 (architecture diagram) labels the sandbox as a black box; adding a flowchart of the validation loop with explicit inputs/outputs would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting areas where additional detail would strengthen the presentation of the sandbox protocol and experimental results. We address each major comment below and will revise the manuscript to incorporate the requested clarifications.

read point-by-point responses

Referee: [§4.2] §4.2 (Sandbox Validation Protocol): The description supplies no concrete safety predicates, coverage metrics over DFJSP disturbance distributions, rejection criteria, or empirical rejection rates for LLM-generated rules. Without these, the guarantee that sandbox-certified rules can be atomically inserted without violating real-time safety or degrading performance is not established, directly undermining the dual-stream separation claim.

Authors: We agree that the current description of the sandbox validation protocol in §4.2 is insufficiently detailed. In the revised manuscript we will expand this section to specify concrete safety predicates (deadline adherence, no negative slack, and machine capacity feasibility), coverage metrics that sample from the disturbance distributions used in GEN-Bench/MK-Bench/JMS-Bench, explicit rejection criteria (any safety violation or performance degradation exceeding a 5 % threshold relative to the reactive baseline), and the empirical rejection rates observed across our rule-synthesis runs. These additions will directly support the claim that atomically inserted rules preserve real-time safety. revision: yes
Referee: [§5] §5 (Experimental Results): The outperformance statements on GEN-Bench, MK-Bench, and JMS-Bench are presented without reported statistical tests, variance across runs, or explicit latency-measurement methodology (wall-clock vs. simulated). These omissions make it impossible to assess whether the reactive stream truly meets millisecond constraints under the reported disturbance regimes.

Authors: We acknowledge that the experimental reporting in §5 lacks the statistical rigor and latency methodology details needed for full assessment. The revised version will report mean and standard deviation of performance metrics over at least five independent random seeds, include paired statistical tests (t-tests or Wilcoxon signed-rank) against the DRL and LLM baselines, and provide an explicit latency protocol stating that all timings are wall-clock measurements on the target hardware under the same simulated disturbance regimes used for the benchmarks. This will confirm that the reactive stream satisfies the millisecond constraint. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework claims rest on benchmark evaluations

full rationale

The paper introduces RACE-Sched as a dual-stream asynchronous framework for DFJSP, with claims of outperformance supported solely by evaluations on GEN-Bench, MK-Bench, and JMS-Bench. No equations, parameter fits, self-definitional constructs, or load-bearing self-citations appear in the abstract or described structure. The deliberative stream's sandbox validation and atomic updates are presented as design choices rather than derivations that reduce to inputs by construction. The central performance assertions are therefore independent empirical statements, not tautological reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review is based solely on the abstract; no explicit free parameters, axioms, or invented entities beyond the framework itself are stated. The central claim rests on unexamined assumptions about LLM sandbox validation and atomic deployment safety.

axioms (1)

domain assumption LLM-generated scheduling rules can be validated for safety and improvement inside a sandbox without introducing unacceptable latency or risk when deployed atomically.
This premise is required for the dual-stream architecture to function as described.

invented entities (1)

RACE-Sched dual-stream architecture no independent evidence
purpose: Decouple real-time policy execution from long-horizon LLM reasoning
New framework introduced in the abstract; no independent evidence outside the paper is mentioned.

pith-pipeline@v0.9.1-grok · 5764 in / 1319 out tokens · 24927 ms · 2026-06-29T07:36:06.144886+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 8 canonical work pages · 6 internal anchors

[1]

Routing and scheduling in a flexible job shop by tabu search.Ann

[Brandimarte, 1993] Paolo Brandimarte. Routing and scheduling in a flexible job shop by tabu search.Ann. Oper . Res., 41(3):157–183,

1993
[2]

Re- flecsched: Solving dynamic flexible job-shop schedul- ing via LLM-powered hierarchical reflection.CoRR, abs/2508.01724,

[Cao and Yuan, 2025] Shijie Cao and Yuan Yuan. Re- flecsched: Solving dynamic flexible job-shop schedul- ing via LLM-powered hierarchical reflection.CoRR, abs/2508.01724,

work page arXiv 2025
[3]

Inverse model and adaptive neighborhood search based cooperative optimizer for energy-efficient distributed flexible job shop scheduling.Swarm Evol

[Caoet al., 2023 ] Shijie Cao, Rui Li, Wenyin Gong, and Chao Lu. Inverse model and adaptive neighborhood search based cooperative optimizer for energy-efficient distributed flexible job shop scheduling.Swarm Evol. Comput., 83:101419,

2023
[4]

Deep reinforcement learning for dy- namic flexible job shop scheduling with random job ar- rival.Processes, 10(4):760,

[Changet al., 2022 ] Jingru Chang, Dong Yu, Yi Hu, Wuwei He, and Haoyu Yu. Deep reinforcement learning for dy- namic flexible job shop scheduling with random job ar- rival.Processes, 10(4):760,

2022
[5]

Fast-in-Slow: A dual-system VLA model uni- fying fast manipulation within slow reasoning

[Chenet al., 2026 ] Hao Chen, Jiaming Liu, Chenyang Gu, Zhuoyang Liu, Renrui Zhang, Xiaoqi Li, Xiao He, Yan- dong Guo, Chi-Wing Fu, Shanghang Zhang, and Pheng- Ann Heng. Fast-in-Slow: A dual-system VLA model uni- fying fast manipulation within slow reasoning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems,

2026
[6]

DeepSeek-V3 Technical Report

[DeepSeek-AI, 2024] DeepSeek-AI. DeepSeek-V3 techni- cal report.CoRR, abs/2412.19437,

work page internal anchor Pith review Pith/arXiv arXiv 2024
[7]

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

[DeepSeek-AI, 2025] DeepSeek-AI. DeepSeek-V3.2: Push- ing the frontier of open large language models.CoRR, abs/2512.02556,

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

[Dinget al., 2025 ] Linshan Ding, Zailin Guan, Dan Luo, and Lei Yue. Data-driven hierarchical multi-policy deep rein- forcement learning framework for multi-objective multi- plicity dynamic flexible job shop scheduling.Journal of Manufacturing Systems, 80:536–562,

2025
[9]

Effective and interpretable dispatch- ing rules for dynamic job shops via guided empirical learn- ing.Omega, 111:102643,

[Ferreiraet al., 2022 ] Cristiane Ferreira, Gonc ¸alo Figueira, and Pedro Amorim. Effective and interpretable dispatch- ing rules for dynamic job shops via guided empirical learn- ing.Omega, 111:102643,

2022
[10]

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

[Gaoet al., 2025 ] Huan-ang Gao, Jiayi Geng, Wenyue Hua, Mengkang Hu, Xinzhe Juan, Hongzhang Liu, Shilong Liu, Jiahao Qiu, Xuan Qi, Yiran Wu, Hongru Wang, Han Xiao, Yuhang Zhou, Shaokun Zhang, Jiayi Zhang, Jinyu Xi- ang, Yixiong Fang, Qiwen Zhao, Dongrui Liu, Qihan Ren, Cheng Qian, Zhenhailong Wang, Minda Hu, Huazheng Wang, Qingyun Wu, Heng Ji, and Mengdi...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[11]

Efficient jobshop dispatching rules: Further developments.Production Planning & Con- trol, 11(2):171–178,

[Holthaus and Rajendran, 2000] Oliver Holthaus and Chan- drasekharan Rajendran. Efficient jobshop dispatching rules: Further developments.Production Planning & Con- trol, 11(2):171–178,

2000
[12]

GPT-4o System Card

[Hurstet al., 2024 ] Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. GPT-4o system card.CoRR, abs/2410.21276,

work page internal anchor Pith review Pith/arXiv arXiv 2024
[13]

Exploring the dy- namic scheduling space of real-time generative AI ap- plications on emerging heterogeneous systems.CoRR, abs/2507.14715,

[Karamiet al., 2025 ] Rachid Karami, Rajeev Patwari, Hy- oukjun Kwon, and Ashish Sirasao. Exploring the dy- namic scheduling space of real-time generative AI ap- plications on emerging heterogeneous systems.CoRR, abs/2507.14715,

work page arXiv 2025
[14]

Efficient memory management for large language model serving with PagedAttention

[Kwonet al., 2023 ] Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with PagedAttention. In Jason Flinn, Margo I. Seltzer, Peter Druschel, Antoine Kaufmann, and Jonathan Mace, editors,Proceedings of the 29th Symposium on...

2023
[15]

Multi-objective dynamic flexible job shop scheduling using multi-head network-based deep re- inforcement learning.Expert Systems with Applications, 298:129542,

[Liet al., 2026 ] Kai Li, Bao Zheng, Liping Xu, Fulong Xie, and Zhicheng Wang. Multi-objective dynamic flexible job shop scheduling using multi-head network-based deep re- inforcement learning.Expert Systems with Applications, 298:129542,

2026
[16]

Code as policies: Language model programs for embodied control

[Lianget al., 2023 ] Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. Code as policies: Language model programs for embodied control. InIEEE International Conference on Robotics and Automation, ICRA 2023, London, UK, May 29 - June 2, 2023, pages 9493–9500. IEEE,

2023
[17]

Is your code generated by ChatGPT really correct? rigorous evaluation of large lan- guage models for code generation

[Liuet al., 2023 ] Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. Is your code generated by ChatGPT really correct? rigorous evaluation of large lan- guage models for code generation. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors,Advances in Neural Informa- tion Processing Systems 36: ...

2023
[18]

Feature selection in evolving job shop dispatching rules with genetic programming

[Meiet al., 2016 ] Yi Mei, Mengjie Zhang, and Su Nguyen. Feature selection in evolving job shop dispatching rules with genetic programming. In Tobias Friedrich, Frank Neumann, and Andrew M. Sutton, editors,Proceedings of the 2016 on Genetic and Evolutionary Computation Con- ference, Denver , CO, USA, July 20 - 24, 2016, pages 365–

2016
[19]

A survey of dynamic scheduling in manufac- turing systems.J

[Ouelhadj and Petrovic, 2009] Djamila Ouelhadj and Sanja Petrovic. A survey of dynamic scheduling in manufac- turing systems.J. of Scheduling, 12(4):417–431, August

2009
[20]

Dynamic scheduling of man- ufacturing systems using machine learning: An updated review.Ai Edam, 28(1):83–97,

[Prioreet al., 2014 ] Paolo Priore, Alberto G ´omez, Ra ´ul Pino, and Rafael Rosillo. Dynamic scheduling of man- ufacturing systems using machine learning: An updated review.Ai Edam, 28(1):83–97,

2014
[21]

Small language models: Architecture, evolution, and the future of artificial intelligence.Preprints, January

[Shahet al., 2026 ] Ankit Parag Shah, Mohammad-Parsa Hosseini, Su Min Park, Connie Miao, and Wei Wei. Small language models: Architecture, evolution, and the future of artificial intelligence.Preprints, January

2026
[22]

OpenAI GPT-5 System Card

[Singhet al., 2025 ] Aaditya Singh, Adam Fry, Adam Perel- man, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, AJ Ostrow, Akhila Ananthram, et al. OpenAI GPT-5 System Card.arXiv preprint arXiv:2601.03267,

work page internal anchor Pith review Pith/arXiv arXiv 2025
[23]

A review of production scheduling with artificial intelligence and digital twins

[Singhet al., 2026 ] Punit Singh, Krishna Krishnan, and Enkhsaikhan Boldsaikhan. A review of production scheduling with artificial intelligence and digital twins. Journal of Manufacturing and Materials Processing, 10(1):6,

2026
[24]

Chi, Quoc V

[Weiet al., 2022 ] Jason Wei, Xuezhi Wang, Dale Schuur- mans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V . Le, and Denny Zhou. Chain-of-thought prompt- ing elicits reasoning in large language models. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh, editors,Advances in Neural Informa- tion Processing Systems 35: A...

2022
[25]

Dynamic scheduling for flexible job shop under machine breakdown using improved double deep q-network.Expert Syst

[Wuet al., 2025 ] Rui Wu, Jianxin Zheng, Xixing Li, Hong- tao Tang, Xi Vincent Wang, and Yibing Li. Dynamic scheduling for flexible job shop under machine breakdown using improved double deep q-network.Expert Syst. Appl., 288:128280,

2025
[26]

Learn to optimise for job shop scheduling: a survey with comparison between genetic programming and reinforcement learning.Artif

[Xuet al., 2025 ] Meng Xu, Yi Mei, Fangfang Zhang, and Mengjie Zhang. Learn to optimise for job shop scheduling: a survey with comparison between genetic programming and reinforcement learning.Artif. Intell. Rev., 58(6):160,

2025
[27]

Large language models as optimizers

[Yanget al., 2024 ] Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V Le, Denny Zhou, and Xinyun Chen. Large language models as optimizers. InThe Twelfth International Conference on Learning Represen- tations,

2024
[28]

Qwen3 Technical Report

[Yanget al., 2025 ] An Yang, Anfeng Li, Baosong Yang, Be- ichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388,

work page internal anchor Pith review Pith/arXiv arXiv 2025
[29]

Tree of thoughts: Deliberate problem solv- ing with large language models

[Yaoet al., 2023 ] Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solv- ing with large language models. In Alice Oh, Tristan Nau- mann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors,Advances in Neural Information Processing Systems 36: Annual C...

2023
[30]

Deep rein- forcement learning based proximal policy optimization al- gorithm for dynamic job shop scheduling.Comput

[Yuanet al., 2025 ] Minghai Yuan, Qi Yu, Lizhi Zhang, Songwei Lu, Zichen Li, and Fengque Pei. Deep rein- forcement learning based proximal policy optimization al- gorithm for dynamic job shop scheduling.Comput. Oper . Res., 183:107149,

2025
[31]

Learning to dispatch for job shop scheduling via deep reinforcement learning

[Zhanget al., 2020 ] Cong Zhang, Wen Song, Zhiguang Cao, Jie Zhang, Puay Siew Tan, and Chi Xu. Learning to dispatch for job shop scheduling via deep reinforcement learning. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, edi- tors,Advances in Neural Information Processing Systems 33: Annual Conference on N...

2020
[32]

Multitask multiobjective genetic pro- gramming for automated scheduling heuristic learning in dynamic flexible job-shop scheduling.IEEE Trans

[Zhanget al., 2023 ] Fangfang Zhang, Yi Mei, Su Nguyen, and Mengjie Zhang. Multitask multiobjective genetic pro- gramming for automated scheduling heuristic learning in dynamic flexible job-shop scheduling.IEEE Trans. Cy- bern., 53(7):4473–4486,

2023
[33]

Meta- relation-based heterogeneous graph neural network with deep reinforcement learning for flexible job shop schedul- ing.Expert Systems with Applications, 291:128411, 2025

[Zhanget al., 2025 ] Yuzhi Zhang, Shidu Dong, Zhenfang Yuan, Ting Wen, Jianfeng Xiao, and Zhuo Diao. Meta- relation-based heterogeneous graph neural network with deep reinforcement learning for flexible job shop schedul- ing.Expert Systems with Applications, 291:128411, 2025

2025

[1] [1]

Routing and scheduling in a flexible job shop by tabu search.Ann

[Brandimarte, 1993] Paolo Brandimarte. Routing and scheduling in a flexible job shop by tabu search.Ann. Oper . Res., 41(3):157–183,

1993

[2] [2]

Re- flecsched: Solving dynamic flexible job-shop schedul- ing via LLM-powered hierarchical reflection.CoRR, abs/2508.01724,

[Cao and Yuan, 2025] Shijie Cao and Yuan Yuan. Re- flecsched: Solving dynamic flexible job-shop schedul- ing via LLM-powered hierarchical reflection.CoRR, abs/2508.01724,

work page arXiv 2025

[3] [3]

Inverse model and adaptive neighborhood search based cooperative optimizer for energy-efficient distributed flexible job shop scheduling.Swarm Evol

[Caoet al., 2023 ] Shijie Cao, Rui Li, Wenyin Gong, and Chao Lu. Inverse model and adaptive neighborhood search based cooperative optimizer for energy-efficient distributed flexible job shop scheduling.Swarm Evol. Comput., 83:101419,

2023

[4] [4]

Deep reinforcement learning for dy- namic flexible job shop scheduling with random job ar- rival.Processes, 10(4):760,

[Changet al., 2022 ] Jingru Chang, Dong Yu, Yi Hu, Wuwei He, and Haoyu Yu. Deep reinforcement learning for dy- namic flexible job shop scheduling with random job ar- rival.Processes, 10(4):760,

2022

[5] [5]

Fast-in-Slow: A dual-system VLA model uni- fying fast manipulation within slow reasoning

[Chenet al., 2026 ] Hao Chen, Jiaming Liu, Chenyang Gu, Zhuoyang Liu, Renrui Zhang, Xiaoqi Li, Xiao He, Yan- dong Guo, Chi-Wing Fu, Shanghang Zhang, and Pheng- Ann Heng. Fast-in-Slow: A dual-system VLA model uni- fying fast manipulation within slow reasoning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems,

2026

[6] [6]

DeepSeek-V3 Technical Report

[DeepSeek-AI, 2024] DeepSeek-AI. DeepSeek-V3 techni- cal report.CoRR, abs/2412.19437,

work page internal anchor Pith review Pith/arXiv arXiv 2024

[7] [7]

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

[DeepSeek-AI, 2025] DeepSeek-AI. DeepSeek-V3.2: Push- ing the frontier of open large language models.CoRR, abs/2512.02556,

work page internal anchor Pith review Pith/arXiv arXiv 2025

[8] [8]

[Dinget al., 2025 ] Linshan Ding, Zailin Guan, Dan Luo, and Lei Yue. Data-driven hierarchical multi-policy deep rein- forcement learning framework for multi-objective multi- plicity dynamic flexible job shop scheduling.Journal of Manufacturing Systems, 80:536–562,

2025

[9] [9]

Effective and interpretable dispatch- ing rules for dynamic job shops via guided empirical learn- ing.Omega, 111:102643,

[Ferreiraet al., 2022 ] Cristiane Ferreira, Gonc ¸alo Figueira, and Pedro Amorim. Effective and interpretable dispatch- ing rules for dynamic job shops via guided empirical learn- ing.Omega, 111:102643,

2022

[10] [10]

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

[Gaoet al., 2025 ] Huan-ang Gao, Jiayi Geng, Wenyue Hua, Mengkang Hu, Xinzhe Juan, Hongzhang Liu, Shilong Liu, Jiahao Qiu, Xuan Qi, Yiran Wu, Hongru Wang, Han Xiao, Yuhang Zhou, Shaokun Zhang, Jiayi Zhang, Jinyu Xi- ang, Yixiong Fang, Qiwen Zhao, Dongrui Liu, Qihan Ren, Cheng Qian, Zhenhailong Wang, Minda Hu, Huazheng Wang, Qingyun Wu, Heng Ji, and Mengdi...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[11] [11]

Efficient jobshop dispatching rules: Further developments.Production Planning & Con- trol, 11(2):171–178,

[Holthaus and Rajendran, 2000] Oliver Holthaus and Chan- drasekharan Rajendran. Efficient jobshop dispatching rules: Further developments.Production Planning & Con- trol, 11(2):171–178,

2000

[12] [12]

GPT-4o System Card

[Hurstet al., 2024 ] Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. GPT-4o system card.CoRR, abs/2410.21276,

work page internal anchor Pith review Pith/arXiv arXiv 2024

[13] [13]

Exploring the dy- namic scheduling space of real-time generative AI ap- plications on emerging heterogeneous systems.CoRR, abs/2507.14715,

[Karamiet al., 2025 ] Rachid Karami, Rajeev Patwari, Hy- oukjun Kwon, and Ashish Sirasao. Exploring the dy- namic scheduling space of real-time generative AI ap- plications on emerging heterogeneous systems.CoRR, abs/2507.14715,

work page arXiv 2025

[14] [14]

Efficient memory management for large language model serving with PagedAttention

[Kwonet al., 2023 ] Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with PagedAttention. In Jason Flinn, Margo I. Seltzer, Peter Druschel, Antoine Kaufmann, and Jonathan Mace, editors,Proceedings of the 29th Symposium on...

2023

[15] [15]

Multi-objective dynamic flexible job shop scheduling using multi-head network-based deep re- inforcement learning.Expert Systems with Applications, 298:129542,

[Liet al., 2026 ] Kai Li, Bao Zheng, Liping Xu, Fulong Xie, and Zhicheng Wang. Multi-objective dynamic flexible job shop scheduling using multi-head network-based deep re- inforcement learning.Expert Systems with Applications, 298:129542,

2026

[16] [16]

Code as policies: Language model programs for embodied control

[Lianget al., 2023 ] Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. Code as policies: Language model programs for embodied control. InIEEE International Conference on Robotics and Automation, ICRA 2023, London, UK, May 29 - June 2, 2023, pages 9493–9500. IEEE,

2023

[17] [17]

Is your code generated by ChatGPT really correct? rigorous evaluation of large lan- guage models for code generation

[Liuet al., 2023 ] Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. Is your code generated by ChatGPT really correct? rigorous evaluation of large lan- guage models for code generation. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors,Advances in Neural Informa- tion Processing Systems 36: ...

2023

[18] [18]

Feature selection in evolving job shop dispatching rules with genetic programming

[Meiet al., 2016 ] Yi Mei, Mengjie Zhang, and Su Nguyen. Feature selection in evolving job shop dispatching rules with genetic programming. In Tobias Friedrich, Frank Neumann, and Andrew M. Sutton, editors,Proceedings of the 2016 on Genetic and Evolutionary Computation Con- ference, Denver , CO, USA, July 20 - 24, 2016, pages 365–

2016

[19] [19]

A survey of dynamic scheduling in manufac- turing systems.J

[Ouelhadj and Petrovic, 2009] Djamila Ouelhadj and Sanja Petrovic. A survey of dynamic scheduling in manufac- turing systems.J. of Scheduling, 12(4):417–431, August

2009

[20] [20]

Dynamic scheduling of man- ufacturing systems using machine learning: An updated review.Ai Edam, 28(1):83–97,

[Prioreet al., 2014 ] Paolo Priore, Alberto G ´omez, Ra ´ul Pino, and Rafael Rosillo. Dynamic scheduling of man- ufacturing systems using machine learning: An updated review.Ai Edam, 28(1):83–97,

2014

[21] [21]

Small language models: Architecture, evolution, and the future of artificial intelligence.Preprints, January

[Shahet al., 2026 ] Ankit Parag Shah, Mohammad-Parsa Hosseini, Su Min Park, Connie Miao, and Wei Wei. Small language models: Architecture, evolution, and the future of artificial intelligence.Preprints, January

2026

[22] [22]

OpenAI GPT-5 System Card

[Singhet al., 2025 ] Aaditya Singh, Adam Fry, Adam Perel- man, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, AJ Ostrow, Akhila Ananthram, et al. OpenAI GPT-5 System Card.arXiv preprint arXiv:2601.03267,

work page internal anchor Pith review Pith/arXiv arXiv 2025

[23] [23]

A review of production scheduling with artificial intelligence and digital twins

[Singhet al., 2026 ] Punit Singh, Krishna Krishnan, and Enkhsaikhan Boldsaikhan. A review of production scheduling with artificial intelligence and digital twins. Journal of Manufacturing and Materials Processing, 10(1):6,

2026

[24] [24]

Chi, Quoc V

[Weiet al., 2022 ] Jason Wei, Xuezhi Wang, Dale Schuur- mans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V . Le, and Denny Zhou. Chain-of-thought prompt- ing elicits reasoning in large language models. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh, editors,Advances in Neural Informa- tion Processing Systems 35: A...

2022

[25] [25]

Dynamic scheduling for flexible job shop under machine breakdown using improved double deep q-network.Expert Syst

[Wuet al., 2025 ] Rui Wu, Jianxin Zheng, Xixing Li, Hong- tao Tang, Xi Vincent Wang, and Yibing Li. Dynamic scheduling for flexible job shop under machine breakdown using improved double deep q-network.Expert Syst. Appl., 288:128280,

2025

[26] [26]

Learn to optimise for job shop scheduling: a survey with comparison between genetic programming and reinforcement learning.Artif

[Xuet al., 2025 ] Meng Xu, Yi Mei, Fangfang Zhang, and Mengjie Zhang. Learn to optimise for job shop scheduling: a survey with comparison between genetic programming and reinforcement learning.Artif. Intell. Rev., 58(6):160,

2025

[27] [27]

Large language models as optimizers

[Yanget al., 2024 ] Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V Le, Denny Zhou, and Xinyun Chen. Large language models as optimizers. InThe Twelfth International Conference on Learning Represen- tations,

2024

[28] [28]

Qwen3 Technical Report

[Yanget al., 2025 ] An Yang, Anfeng Li, Baosong Yang, Be- ichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388,

work page internal anchor Pith review Pith/arXiv arXiv 2025

[29] [29]

Tree of thoughts: Deliberate problem solv- ing with large language models

[Yaoet al., 2023 ] Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solv- ing with large language models. In Alice Oh, Tristan Nau- mann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors,Advances in Neural Information Processing Systems 36: Annual C...

2023

[30] [30]

Deep rein- forcement learning based proximal policy optimization al- gorithm for dynamic job shop scheduling.Comput

[Yuanet al., 2025 ] Minghai Yuan, Qi Yu, Lizhi Zhang, Songwei Lu, Zichen Li, and Fengque Pei. Deep rein- forcement learning based proximal policy optimization al- gorithm for dynamic job shop scheduling.Comput. Oper . Res., 183:107149,

2025

[31] [31]

Learning to dispatch for job shop scheduling via deep reinforcement learning

[Zhanget al., 2020 ] Cong Zhang, Wen Song, Zhiguang Cao, Jie Zhang, Puay Siew Tan, and Chi Xu. Learning to dispatch for job shop scheduling via deep reinforcement learning. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, edi- tors,Advances in Neural Information Processing Systems 33: Annual Conference on N...

2020

[32] [32]

Multitask multiobjective genetic pro- gramming for automated scheduling heuristic learning in dynamic flexible job-shop scheduling.IEEE Trans

[Zhanget al., 2023 ] Fangfang Zhang, Yi Mei, Su Nguyen, and Mengjie Zhang. Multitask multiobjective genetic pro- gramming for automated scheduling heuristic learning in dynamic flexible job-shop scheduling.IEEE Trans. Cy- bern., 53(7):4473–4486,

2023

[33] [33]

Meta- relation-based heterogeneous graph neural network with deep reinforcement learning for flexible job shop schedul- ing.Expert Systems with Applications, 291:128411, 2025

[Zhanget al., 2025 ] Yuzhi Zhang, Shidu Dong, Zhenfang Yuan, Ting Wen, Jianfeng Xiao, and Zhuo Diao. Meta- relation-based heterogeneous graph neural network with deep reinforcement learning for flexible job shop schedul- ing.Expert Systems with Applications, 291:128411, 2025

2025