Preference-Agile Multi-Objective Optimization for Real-time Vehicle Dispatching

Jiahuan Jin; Jianfeng Ren; Qingfu Zhang; Rong Qu; Ruibin Bai; Wenhao Zhao; Xinan Chen

arxiv: 2604.10664 · v1 · submitted 2026-04-12 · 💻 cs.AI

Preference-Agile Multi-Objective Optimization for Real-time Vehicle Dispatching

Jiahuan Jin , Wenhao Zhao , Rong Qu , Jianfeng Ren , Xinan Chen , Qingfu Zhang , Ruibin Bai This is my paper

Pith reviewed 2026-05-10 15:23 UTC · model grok-4.3

classification 💻 cs.AI

keywords multi-objective optimizationdeep reinforcement learningdynamic preferencesvehicle dispatchingreal-time decision makingcontainer terminalpreference alignment

0 comments

The pith

A DRL framework accepts live preference vectors and aligns them to policies via calibration for dynamic multi-objective vehicle dispatching.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PAMOO to handle real-time multi-objective decisions where operators can change priorities among conflicting goals without restarting the solver. It builds a single deep reinforcement learning model that ingests explicit preference vectors at every step and adds a fitted calibration function to keep the generated policies close to high-quality solutions for the chosen weights. Existing methods either fix the objectives in advance or address only static non-sequential cases, so they cannot support the sequential, high-frequency adjustments required in live operations such as container-terminal vehicle routing. If the alignment holds, the approach delivers better performance and generalization than standard multi-objective optimizers on the same terminal data.

Core claim

PAMOO is a uniform DRL model that takes dynamic preference vectors as direct inputs and uses a calibration function to ensure the output policies remain aligned with those preferences, yielding superior results on sequential dynamic MOO problems in real-life vehicle dispatching at a container terminal.

What carries the argument

A uniform deep reinforcement learning model that receives dynamic preference vectors as explicit inputs together with a fitted calibration function that maps those vectors to high-quality output policies.

If this is right

Operators can adjust objective weights interactively during operation without retraining or switching models.
The same trained policy network serves multiple preference settings, reducing the need for separate solvers per weight combination.
The method extends to other sequential real-time dispatching tasks that involve shifting priorities among cost, time, and resource objectives.
It provides the first explicit handling of dynamic sequential MOO decisions rather than only static or non-sequential cases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The calibration step could be tested in other DRL domains where user-specified trade-offs must be honored without retraining, such as traffic signal control or energy scheduling.
If the alignment remains stable under rapid preference shifts, the framework reduces the engineering cost of maintaining multiple single-objective agents.
The approach invites direct comparison against evolutionary or gradient-based dynamic MOO solvers on the same sequential benchmark to isolate the benefit of the DRL backbone.

Load-bearing premise

A fitted calibration function can reliably map arbitrary dynamic preference vectors to stable, high-quality DRL policies across sequential decision steps in real time.

What would settle it

An experiment in which live changes to the preference vector produce dispatching policies whose performance on the container-terminal benchmark falls below that of fixed-preference baselines or degrades measurably over successive steps.

Figures

Figures reproduced from arXiv: 2604.10664 by Jiahuan Jin, Jianfeng Ren, Qingfu Zhang, Rong Qu, Ruibin Bai, Wenhao Zhao, Xinan Chen.

**Figure 1.** Figure 1: A simple scenario example to illustrate the proposed PAMOO algorithm for online truck dispatching in a container terminal. At time 𝑡, one idle truck needs to be dispatched for a new task (dedicated to different QCs). Among three choices QC1, QC2 and QC3 with incremental queue lengths and decremental empty travel distances (indicated by three red lines on the left side of figure). Dispatching decisions are … view at source ↗

**Figure 2.** Figure 2: A route example for a single task. An idle truck receives the dispatching task at yard A. The first and second operation nodes are QC1 and crane at yard B, respectively. The truck route in red represents empty mileage and the route in blue is the loaded travel distance. The objectives are to minimize both aggregated idle time of all QCs and total empty mileages by all trucks. 𝑂1 𝑞 𝑂4 𝑞 𝑻𝒊𝒏𝒊𝒕 Quay Crane q :… view at source ↗

**Figure 3.** Figure 3: An illustrative example where truck with task 𝑤 𝑞 𝑖 arrives too late at 𝑞-th QC, causing a QC idle duration (a) and arrives at QC before the prior task’s completion, resulting in truck queuing (b). Jin et al.: Preprint submitted to Elsevier Page 22 of 21 [PITH_FULL_IMAGE:figures/full_fig_p023_3.png] view at source ↗

**Figure 4.** Figure 4: An illustration of learning based interactive adjustments of user preferences in PAMOO. It employs a multi-policy inner loop RL with policies 𝜋 ∗ (𝑎|𝑠, 𝜃). Once trained, PAMOO makes decisions for any combinations of state 𝑠 and preference vector 𝒑 in a single run. Linear Linear Linear Scaled Dot-Product Attention Concat Feature vector of each QC Linear Neighborhood-aware QC feature vectors Heads Feed Forwa… view at source ↗

**Figure 5.** Figure 5: The network structure of the proposed PAMOO for online truck dispatching. 𝝅𝒓𝒆𝒇 Objective 1 Objective 2 𝜶𝒕 𝜶𝒄 [0.5, 0.5] 𝝅𝟏(𝒑𝟏) 𝝅𝟐 𝝅𝟑(𝒑𝟑) (𝒑𝟐) [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗

**Figure 6.** Figure 6: Interpretation of the preference calibration method. 𝝅𝒓𝒆𝒇 𝝅𝟏 𝝅𝟐 𝝅𝟑 𝝅𝟒 Objective 1 (𝑽𝟏 π ) Objective 2 ( 𝑽 π 𝟐 ) [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗

**Figure 8.** Figure 8: Approximate Pareto front obtained by our method and benchmarks on instances of different number of trucks. Jin et al.: Preprint submitted to Elsevier Page 24 of 21 [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗

**Figure 9.** Figure 9: Pareto frontiers generated by proposed method that trained by a small number of preferences. 0.0 0.2 0.4 0.6 0.8 1.0 QC Idle Time 0.0 0.2 0.4 0.6 0.8 1.0 Empty Travel Distance Outer Loop Method PAMOO [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗

**Figure 10.** Figure 10: Pareto frontiers generated by PAMOO and outer loop method on instance of 120 trucks. 0.70 0.72 0.74 0.76 0.78 Hyper Volume 0 10000 20000 30000 40000 50000 60000 70000 Total Sample Collected PAMOO Outer Loop NSGA-II MOEA-D [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗

**Figure 11.** Figure 11: Sample efficiency of inner-loop (ours) and outerloop methods compared with NSGA-II and MOEA-D. 0.0 0.2 0.4 0.6 0.8 1.0 QC Idle Time 0.0 0.2 0.4 0.6 0.8 1.0 Empty Travel Distance Homogeneous Preference DQN Method PAMOO [PITH_FULL_IMAGE:figures/full_fig_p026_11.png] view at source ↗

read the original abstract

Multi-objective optimization (MOO) has been widely studied in literature because of its versatility in human-centered decision making in real-life applications. Recently, demand for dynamic MOO is fast-emerging due to tough market dynamics that require real-time re-adjustments of priorities for different objectives. However, most existing studies focus either on deterministic MOO problems which are not practical, or non-sequential dynamic MOO decision problems that cannot deal with some real-life complexities. To address these challenges, a preference-agile multi-objective optimization (PAMOO) is proposed in this paper to permit users to dynamically adjust and interactively assign the preferences on the fly. To achieve this, a novel uniform model within a deep reinforcement learning (DRL) framework is proposed that can take as inputs users' dynamic preference vectors explicitly. Additionally, a calibration function is fitted to ensure high quality alignment between the preference vector inputs and the output DRL decision policy. Extensive experiments on challenging real-life vehicle dispatching problems at a container terminal showed that PAMOO obtains superior performance and generalization ability when compared with two most popular MOO methods. Our method presents the first dynamic MOO method for challenging \rev{dynamic sequential MOO decision problems

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a DRL setup that takes explicit dynamic preference vectors for sequential vehicle dispatching plus a calibration step, but the evidence for reliable performance across arbitrary changes is thin.

read the letter

This paper introduces a deep reinforcement learning model for multi-objective vehicle dispatching that accepts changing user preferences as direct inputs during operation, along with a fitted calibration function meant to keep the policy aligned. That combination for real-time sequential decisions is the main new element. Most prior MOO work stays either static or non-sequential, so the framing correctly spots a gap in logistics settings where priorities must shift on the fly, such as at a container terminal. The choice of vehicle dispatching as the test case is reasonable and keeps the work grounded in a concrete application rather than abstract benchmarks. The claim of better performance and generalization than two standard MOO methods at least shows they ran experiments on actual instances instead of toy problems. The paper does a fair job laying out why interactive priority changes matter in practice and why a uniform DRL policy could support that. The soft spots sit mainly in the calibration function and the missing detail on results. The abstract says the function is fitted for high-quality alignment, yet gives no functional form, training objective, or test for drift when preferences change rapidly across sequential steps. In a dispatching MDP, even small misalignment at one time step shifts the state for the next, so the lack of any stability argument or sensitivity check leaves open the possibility that reported gains depend on the specific preference sequences tested. No quantitative metrics, statistical tests, or ablation results appear in the provided text, which makes the superiority statement impossible to evaluate directly. The stress-test concern about cumulative bias therefore stands up on the current evidence. This work is aimed at applied researchers who build optimization tools for transportation or logistics. A reader already working on DRL for dynamic MOO would pick up the preference-vector idea and the terminal dispatching example, but would still need the full methods section and tables to judge whether the approach holds. I would send it for peer review. The practical angle is clear enough to merit referee time, though the calibration robustness and experimental reporting will need tightening before the claims can be taken at face value.

Referee Report

2 major / 1 minor

Summary. The paper proposes Preference-Agile Multi-Objective Optimization (PAMOO), a DRL-based framework for dynamic multi-objective optimization in real-time vehicle dispatching at container terminals. It introduces a uniform model that accepts dynamic user preference vectors as explicit inputs and fits a calibration function to align these vectors with high-quality output policies. The central claim is that extensive experiments on challenging real-life container-terminal dispatching problems demonstrate superior performance and generalization ability relative to two popular MOO methods.

Significance. If the performance claims hold with rigorous evidence, the work could contribute a practical method for handling changing priorities in sequential decision problems common to logistics and operations research. The explicit incorporation of dynamic preferences into a DRL policy is a relevant direction for human-centered real-time systems.

major comments (2)

[Abstract] Abstract: the assertion of 'superior performance and generalization ability' is unsupported by any quantitative metrics, statistical tests, baseline specifications, or ablation results. This directly undermines verification of the central empirical claim.
[Methods] Calibration function description (Methods section): no functional form, training objective, or analysis of stability under rapid preference changes is supplied. In a sequential MDP, even small misalignment at one dispatching step alters the subsequent state distribution, so the absence of guarantees against compounding error or bias is load-bearing for the superiority claim over standard MOO baselines.

minor comments (1)

[Abstract] Abstract: the final sentence appears truncated ('Our method presents the first dynamic MOO method for challenging dynamic sequential MOO decision problems').

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help improve the clarity and rigor of our work. We address each major comment point by point below and outline the revisions we will make to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion of 'superior performance and generalization ability' is unsupported by any quantitative metrics, statistical tests, baseline specifications, or ablation results. This directly undermines verification of the central empirical claim.

Authors: We agree that the abstract, as a concise summary, does not embed the specific quantitative metrics, statistical tests, or ablation details that appear in the full Experiments section. The manuscript does report performance tables with percentage improvements over the two standard MOO baselines, generalization results across terminal scenarios, and statistical significance via paired t-tests. To directly address the concern, we will revise the abstract to include key quantitative highlights (e.g., average improvement percentages and p-values) and a brief statement of the baselines used, while preserving its length constraints. revision: yes
Referee: [Methods] Calibration function description (Methods section): no functional form, training objective, or analysis of stability under rapid preference changes is supplied. In a sequential MDP, even small misalignment at one dispatching step alters the subsequent state distribution, so the absence of guarantees against compounding error or bias is load-bearing for the superiority claim over standard MOO baselines.

Authors: We acknowledge that the current Methods description of the calibration function is high-level and omits the explicit functional form, training objective, and stability analysis under rapid preference shifts. The function is realized as a small neural network trained to align input preference vectors with high-quality policies obtained from offline optimization; we will add its precise mathematical definition, the regression-style training loss, and a dedicated stability subsection. This subsection will include both a short analysis of error propagation in the sequential MDP and new empirical results measuring policy degradation under fast preference changes, thereby strengthening the comparison to standard MOO methods. revision: yes

Circularity Check

0 steps flagged

No load-bearing circularity; calibration function is auxiliary alignment step

full rationale

The paper introduces a DRL framework that explicitly accepts dynamic preference vectors as inputs and fits a calibration function to align them with output policies. Superior performance and generalization are asserted via experiments on container-terminal dispatching instances against standard MOO baselines. No equations or derivations are presented that reduce the reported performance metrics to the calibration fit by construction, nor is the calibration invoked as a uniqueness theorem or self-cited load-bearing premise. The function is described as an auxiliary fitting step rather than a definitional loop that forces the outcome. The derivation chain therefore remains self-contained against external benchmarks and does not exhibit the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Paper rests on standard DRL convergence assumptions and the unproven effectiveness of the calibration function for preference alignment; no explicit free parameters or new entities are named in the abstract.

pith-pipeline@v0.9.0 · 5528 in / 1024 out tokens · 20745 ms · 2026-05-10T15:23:58.569140+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · 1 internal anchor

[1]

Multi-objective fitted q- iteration: Pareto frontier approximation in one single run, in: 2011 International Conference on Networking, Sensing and Control, IEEE. pp. 260–265. Chen, J., Bai, R., Dong, H., Qu, R., Kendall, G.,

work page 2011
[2]

A dynamic truck dispatching problem in marine container terminal, in: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), IEEE. pp. 1–

work page 2016
[3]

A data-driven geneticprogrammingheuristicforreal-worlddynamicseaportcontainer terminal truck dispatching, in: 2020 IEEE Congress on Evolutionary Computation (CEC), IEEE. pp. 1–8. Chen, X., Bai, R., Qu, R., Dong, J., Jin, Y.,

work page 2020
[4]

Meta- learning for multi-objective reinforcement learning, in: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE. pp. 977–983. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.,

work page 2019
[5]

Dynamicmultiobjectiveoptimization problems: test cases, approximations, and applications

Farina,M.,Deb,K.,Amato,P.,2004. Dynamicmultiobjectiveoptimization problems: test cases, approximations, and applications. IEEE Transac- tions on Evolutionary Computation 8, 425–442. Hayes, C.F., Rădulescu, R., Bargiacchi, E., Källström, J., Macfarlane, M., Reymond, M., Verstraeten, T., Zintgraf, L.M., Dazeley, R., Heintz, F., etal.,2022. Apracticalguideto...

work page 2004
[6]

Multi-objective optimization of dispatching strategies for situation-adaptive AGV operation in an au- tomated container terminal, in: Proceedings of the 2013 Research in Adaptive and Convergent Systems, pp. 1–6. Li, K., Zhang, T., Wang, R.,

work page 2013
[7]

Transportation Research Jin et al.:Preprint submitted to ElsevierPage 20 of 21 Preference-Agile Multi-Objective Optimization Part B: Methodological 93, 720–749

Bi-objective optimization for the container terminal integrated planning. Transportation Research Jin et al.:Preprint submitted to ElsevierPage 20 of 21 Preference-Agile Multi-Objective Optimization Part B: Methodological 93, 720–749. Maashi,M.,Özcan,E.,Kendall,G.,2014.Amulti-objectivehyper-heuristic based on choice function. Expert Systems with Applicati...

work page 2014
[8]

Engineering Design and Decision-Making Models. Ph.D. thesis. University of Debrecen. Parisi,S.,Pirotta,M.,Smacchia,N.,Bascetta,L.,Restelli,M.,2014. Policy gradient approaches for multi-objective sequential decision making, in: 2014 International Joint Conference on Neural Networks (IJCNN), IEEE. pp. 2323–2330. Prayogo, D.N., Komarudin, A.H., Mubarak, A.,

work page 2014
[9]

Neuro- computing 263, 15–25

A temporal difference method for multi-objective reinforcement learning. Neuro- computing 263, 15–25. Sarkar,P.,Khanapuri,V.B.,Tiwari,M.K.,2025. Integratingmachinelearn- ing with dynamic multi-objective optimization for real-time decision- making. Information Sciences 690, 121524. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.,

work page 2025
[10]

Proximal Policy Optimization Algorithms

Proximal policy optimization algorithms. arXiv:1707.06347 . Skinner,B.,Yuan,S.,Huang,S.,Liu,D.,Cai,B.,Dissanayake,G.,Lau,H., Bott,A.,Pagac,D.,2013. Optimisationforjobschedulingatautomated container terminals using genetic algorithm. Computers & Industrial Engineering 64, 511–523. Tu, B., Kantas, N., Lee, R.M., Shafei, B.,

work page internal anchor Pith review Pith/arXiv arXiv 2013
[11]

Adeepreinforcement learninghyper-heuristicwithfeaturefusionforonlinepackingproblems

Tu,C.,Bai,R.,Aickelin,U.,Zhang,Y.,Du,H.,2023. Adeepreinforcement learninghyper-heuristicwithfeaturefusionforonlinepackingproblems. Expert Systems with Applications 230, 120568. Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez,A.N., Kaiser, Ł., Polosukhin, I.,

work page 2023
[12]

5998–6008

Attention is all you need, in: Advances in neural information processing systems, pp. 5998–6008. Vinyals,O.,Fortunato,M.,Jaitly,N.,2015. Pointernetworks. Advancesin neural information processing systems

work page 2015
[13]

Swarm and Evolutionary Computation 99, 102160

An evolutionary method with shift pattern learning for real-world multi-skilled personnel scheduling with flexible shifts. Swarm and Evolutionary Computation 99, 102160. Zhang,H.,Liu,T.Y.,Bai,R.,2026.Onlinerisk-awarepatternadjustmentfor bin packing problem. Expert Systems with Applications 308, 131074. Zhang, Q., Li, H.,

work page 2026
[14]

IEEE TransactionsonNeuralNetworksandLearningSystems34,7978–7991

Meta-learning-based deep reinforcement learning for multiobjective optimization problems. IEEE TransactionsonNeuralNetworksandLearningSystems34,7978–7991. Jin et al.:Preprint submitted to ElsevierPage 21 of 21 Preference-Agile Multi-Objective Optimization Q C 1 Q C 2 Q C 3 idle truck Yard A Yard B First Operating Node Second Operating Node Figure 2:A rout...

work page 2000

[1] [1]

Multi-objective fitted q- iteration: Pareto frontier approximation in one single run, in: 2011 International Conference on Networking, Sensing and Control, IEEE. pp. 260–265. Chen, J., Bai, R., Dong, H., Qu, R., Kendall, G.,

work page 2011

[2] [2]

A dynamic truck dispatching problem in marine container terminal, in: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), IEEE. pp. 1–

work page 2016

[3] [3]

A data-driven geneticprogrammingheuristicforreal-worlddynamicseaportcontainer terminal truck dispatching, in: 2020 IEEE Congress on Evolutionary Computation (CEC), IEEE. pp. 1–8. Chen, X., Bai, R., Qu, R., Dong, J., Jin, Y.,

work page 2020

[4] [4]

Meta- learning for multi-objective reinforcement learning, in: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE. pp. 977–983. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.,

work page 2019

[5] [5]

Dynamicmultiobjectiveoptimization problems: test cases, approximations, and applications

Farina,M.,Deb,K.,Amato,P.,2004. Dynamicmultiobjectiveoptimization problems: test cases, approximations, and applications. IEEE Transac- tions on Evolutionary Computation 8, 425–442. Hayes, C.F., Rădulescu, R., Bargiacchi, E., Källström, J., Macfarlane, M., Reymond, M., Verstraeten, T., Zintgraf, L.M., Dazeley, R., Heintz, F., etal.,2022. Apracticalguideto...

work page 2004

[6] [6]

Multi-objective optimization of dispatching strategies for situation-adaptive AGV operation in an au- tomated container terminal, in: Proceedings of the 2013 Research in Adaptive and Convergent Systems, pp. 1–6. Li, K., Zhang, T., Wang, R.,

work page 2013

[7] [7]

Transportation Research Jin et al.:Preprint submitted to ElsevierPage 20 of 21 Preference-Agile Multi-Objective Optimization Part B: Methodological 93, 720–749

Bi-objective optimization for the container terminal integrated planning. Transportation Research Jin et al.:Preprint submitted to ElsevierPage 20 of 21 Preference-Agile Multi-Objective Optimization Part B: Methodological 93, 720–749. Maashi,M.,Özcan,E.,Kendall,G.,2014.Amulti-objectivehyper-heuristic based on choice function. Expert Systems with Applicati...

work page 2014

[8] [8]

Engineering Design and Decision-Making Models. Ph.D. thesis. University of Debrecen. Parisi,S.,Pirotta,M.,Smacchia,N.,Bascetta,L.,Restelli,M.,2014. Policy gradient approaches for multi-objective sequential decision making, in: 2014 International Joint Conference on Neural Networks (IJCNN), IEEE. pp. 2323–2330. Prayogo, D.N., Komarudin, A.H., Mubarak, A.,

work page 2014

[9] [9]

Neuro- computing 263, 15–25

A temporal difference method for multi-objective reinforcement learning. Neuro- computing 263, 15–25. Sarkar,P.,Khanapuri,V.B.,Tiwari,M.K.,2025. Integratingmachinelearn- ing with dynamic multi-objective optimization for real-time decision- making. Information Sciences 690, 121524. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.,

work page 2025

[10] [10]

Proximal Policy Optimization Algorithms

Proximal policy optimization algorithms. arXiv:1707.06347 . Skinner,B.,Yuan,S.,Huang,S.,Liu,D.,Cai,B.,Dissanayake,G.,Lau,H., Bott,A.,Pagac,D.,2013. Optimisationforjobschedulingatautomated container terminals using genetic algorithm. Computers & Industrial Engineering 64, 511–523. Tu, B., Kantas, N., Lee, R.M., Shafei, B.,

work page internal anchor Pith review Pith/arXiv arXiv 2013

[11] [11]

Adeepreinforcement learninghyper-heuristicwithfeaturefusionforonlinepackingproblems

Tu,C.,Bai,R.,Aickelin,U.,Zhang,Y.,Du,H.,2023. Adeepreinforcement learninghyper-heuristicwithfeaturefusionforonlinepackingproblems. Expert Systems with Applications 230, 120568. Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez,A.N., Kaiser, Ł., Polosukhin, I.,

work page 2023

[12] [12]

5998–6008

Attention is all you need, in: Advances in neural information processing systems, pp. 5998–6008. Vinyals,O.,Fortunato,M.,Jaitly,N.,2015. Pointernetworks. Advancesin neural information processing systems

work page 2015

[13] [13]

Swarm and Evolutionary Computation 99, 102160

An evolutionary method with shift pattern learning for real-world multi-skilled personnel scheduling with flexible shifts. Swarm and Evolutionary Computation 99, 102160. Zhang,H.,Liu,T.Y.,Bai,R.,2026.Onlinerisk-awarepatternadjustmentfor bin packing problem. Expert Systems with Applications 308, 131074. Zhang, Q., Li, H.,

work page 2026

[14] [14]

IEEE TransactionsonNeuralNetworksandLearningSystems34,7978–7991

Meta-learning-based deep reinforcement learning for multiobjective optimization problems. IEEE TransactionsonNeuralNetworksandLearningSystems34,7978–7991. Jin et al.:Preprint submitted to ElsevierPage 21 of 21 Preference-Agile Multi-Objective Optimization Q C 1 Q C 2 Q C 3 idle truck Yard A Yard B First Operating Node Second Operating Node Figure 2:A rout...

work page 2000