Recognition: unknown
Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers
Pith reviewed 2026-05-10 15:36 UTC · model grok-4.3
The pith
Deep learning complements optimization for sequential decisions under uncertainty rather than replacing it.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Deep learning is valuable not as a replacement for optimization, but as a complement to it. Deep learning brings adaptability and scalable approximation, whereas OR/MS provides the structural rigor needed to represent constraints, recourse, and uncertainty. The tutorial reviews key decision-making foundations, connects them to major neural architectures, and discusses leading approaches to integrating learning and optimization, while highlighting applications in supply chains, healthcare, agriculture, energy, and autonomous operations as part of a shift from predictive to decision-capable AI.
What carries the argument
Hybrid integration approaches that pair neural architectures for approximation with optimization models that enforce constraints and model uncertainty in sequential decisions.
If this is right
- Hybrid systems scale to large decision problems while respecting domain constraints and uncertainty structures.
- Concrete applications improve in supply chains, healthcare and epidemic response, agriculture, energy, and autonomous operations.
- Operations research helps shape integrated learning-optimization systems during the move from predictive to decision-capable AI.
- Neural architectures such as transformers and deep reinforcement learning become practical tools inside optimization frameworks.
Where Pith is reading between the lines
- Hybrid methods may require new training procedures that embed optimization constraints directly into neural loss functions.
- Domain-specific uncertainty models from operations research could be used to generate training scenarios for neural networks.
- The perspective suggests testing whether certain neural architectures align better with particular classes of stochastic programs.
- Educational curricula in operations research and machine learning may converge around shared hybrid case studies.
Load-bearing premise
That deep learning and optimization can be integrated effectively at scale while retaining the structural benefits of optimization models for constraints and uncertainty.
What would settle it
A head-to-head empirical comparison on standard benchmark problems showing that either pure deep learning or pure optimization methods achieve equal or better performance and scalability than the reviewed hybrid approaches.
Figures
read the original abstract
Artificial intelligence (AI) is moving increasingly beyond prediction to support decisions in complex, uncertain, and dynamic environments. This shift creates a natural intersection with operations research and management sciences (OR/MS), which have long offered conceptual and methodological foundations for sequential decision-making under uncertainty. At the same time, recent advances in deep learning, including feedforward neural networks, LSTMs, transformers, and deep reinforcement learning, have expanded the scope of data-driven modeling and opened new possibilities for large-scale decision systems. This tutorial presents an OR/MS-centered perspective on deep learning for sequential decision-making under uncertainty. Its central premise is that deep learning is valuable not as a replacement for optimization, but as a complement to it. Deep learning brings adaptability and scalable approximation, whereas OR/MS provides the structural rigor needed to represent constraints, recourse, and uncertainty. The tutorial reviews key decision-making foundations, connects them to the major neural architectures in modern AI, and discusses leading approaches to integrating learning and optimization. It also highlights emerging impact in domains such as supply chains, healthcare and epidemic response, agriculture, energy, and autonomous operations. More broadly, it frames these developments as part of a wider transition from predictive AI toward decision-capable AI and highlights the role of OR/MS in shaping the next generation of integrated learning--optimization systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a tutorial presenting an OR/MS-centered perspective on deep learning for sequential decision-making under uncertainty. Its central premise is that deep learning complements rather than replaces optimization: DL provides adaptability and scalable approximation while OR/MS supplies structural rigor for constraints, recourse, and uncertainty. It reviews decision-making foundations, connects them to neural architectures including feedforward networks, LSTMs, transformers, and deep reinforcement learning, discusses leading integration approaches, and highlights applications in supply chains, healthcare, agriculture, energy, and autonomous operations, framing a broader shift from predictive to decision-capable AI.
Significance. If the described integrations prove effective, the tutorial could meaningfully advance interdisciplinary research by offering a balanced conceptual framework that bridges AI and OR/MS communities. It explicitly credits the review of leading neural architectures and application domains as a foundation for hybrid learning-optimization systems, which may guide scalable decision systems while preserving OR/MS structural benefits.
minor comments (2)
- [Abstract] The abstract and introduction could more explicitly list the specific integration frameworks reviewed (e.g., end-to-end learning, hybrid models) to help readers navigate the tutorial's structure.
- [Applications] Application sections would benefit from brief pointers to key references or case studies for each domain (supply chains, healthcare, etc.) to strengthen the illustrative value without altering the tutorial format.
Simulated Author's Rebuttal
We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The referee accurately captures the tutorial's OR/MS-centered framing of deep learning as a complement to optimization for sequential decision-making under uncertainty, including its coverage of neural architectures, integration approaches, and applications. No major comments were raised, so we will incorporate any minor editorial suggestions from the editor.
Circularity Check
No significant circularity: review paper with no derivations
full rationale
This is a tutorial/review manuscript that frames deep learning as a complement to OR/MS for sequential decision-making under uncertainty. It contains no original mathematical derivations, equations, fitted parameters, predictions, or uniqueness theorems. All content is descriptive synthesis of prior literature, with the central premise stated at a high level without any reduction to self-referential inputs or self-citation chains. No load-bearing steps exist that could be circular by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
European Journal of Operational Research 183, 1109–1130
Absi N, van den Heuvel W (2019) Worst-case analysis of relax and fix heuristics for lot-sizing problems. European Journal of Operational Research279(2):449–458, URLhttp://dx.doi.org/10.1016/j.ejor. 2019.06.010
-
[2]
AgrawalA,AmosB,BarrattS,BoydS,DiamondS,KolterJZ(2019)Differentiableconvexoptimizationlayers. Advances in Neural Information Processing Systems, volume 32, URLhttp://dx.doi.org/10.48550/ arXiv.1910.12430
-
[3]
AhmaditeshniziA,GaoW,UdellM(2024)Optimus: Scalableoptimizationmodelingwith(MI)LPsolversand large language models.Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning Research, 577–596 (PMLR), URLhttps://proceedings.mlr.press/ v235/ahmaditeshnizi24a.html
2024
-
[4]
URLhttps: //optimization-online.org/2019/05/7199/, optimization Online, pp
Ahmed S, Ding L, Shapiro A (2019) A python package for multi-stage stochastic programming. URLhttps: //optimization-online.org/2019/05/7199/, optimization Online, pp. 1–41
2019
-
[5]
AmosB,KolterJZ(2017)Optnet: Differentiableoptimizationasalayerinneuralnetworks.Proceedingsofthe 34thInternationalConferenceonMachineLearning,volume70ofProceedingsofMachineLearningResearch, 136–145 (PMLR), URLhttp://dx.doi.org/10.48550/arXiv.1703.00443
-
[6]
org/10.1007/s10107-020-01474-5
Anderson R, Huchette J, Ma W, Tjandraatmadja C, Vielma JP (2020) Strong mixed-integer programming formulations for trained neural networks.Mathematical Programming183(1):3–39, URLhttp://dx.doi. org/10.1007/s10107-020-01474-5
-
[7]
Angelopoulos AN, Bates S (2023) Conformal prediction: A gentle introduction.Foundations and Trends in Machine Learning16(4):494–591, URLhttp://dx.doi.org/10.1561/2200000101
-
[9]
InternationalConferenceonLearningRepresentations,URLhttp://dx.doi.org/10.48550/arXiv.1409
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. InternationalConferenceonLearningRepresentations,URLhttp://dx.doi.org/10.48550/arXiv.1409. 0473
-
[10]
Bai S, Kolter JZ, Koltun V (2018) An empirical evaluation of generic convolutional and recurrent networks for sequence modeling.arXiv preprint arXiv:1803.01271URLhttp://dx.doi.org/10.48550/arXiv.1803. 01271
-
[11]
Working paper
Baswapuram AK, Chen C, Cai W, Büyüktahtakın İE (2026) An interpretable ensemble heuristic for principal- agent games with machine learning. Working paper
2026
- [12]
-
[13]
Bello I, Pham H, Le QV, Norouzi M, Bengio S (2017) Neural combinatorial optimization with reinforce- 36 mentlearning.InternationalConferenceonLearningRepresentations,URLhttp://dx.doi.org/10.48550/ arXiv.1611.09940
work page Pith review arXiv 2017
-
[14]
Ben-TalA,NemirovskiA(1998)Robustconvexoptimization.MathematicsofOperationsResearch23(4):769– 805, URLhttp://dx.doi.org/10.1287/moor.23.4.769
-
[15]
Benders JF (1962) Partitioning procedures for solving mixed-variables programming problems.Numerische Mathematik4(1):238–252, URLhttp://dx.doi.org/10.1007/BF01386316
-
[16]
1016/j.ejor.2020.07.063
Bengio Y, Lodi A, Prouvost A (2021) Machine learning for combinatorial optimization: A methodological tour d’horizon.European Journal of Operational Research290(2):405–421, URLhttp://dx.doi.org/10. 1016/j.ejor.2020.07.063
2021
-
[17]
BengioY,SimardP,FrasconiP(1994)Learninglong-termdependencieswithgradientdescentisdifficult.IEEE Transactions on Neural Networks5(2):157–166, URLhttp://dx.doi.org/10.1109/72.279181
-
[18]
Bertsekas DP (1995)Dynamic Programming and Optimal Control(Belmont, MA: Athena Scientific), ISBN 9781886529434, URLhttps://www.athenasc.com/dpcontents.html
1995
-
[19]
Bertsekas DP, Tsitsiklis JN (1996)Neuro-Dynamic Programming(Belmont, MA: Athena Scientific), ISBN 9781886529106, URLhttps://www.athenasc.com/ndpbook.html
1996
-
[20]
Bertsimas D, Demir R (2002) An approximate dynamic programming approach to multidimensional knapsack problems.ManagementScience48(4):550–565,URLhttp://dx.doi.org/10.1287/mnsc.48.4.550.208
-
[21]
Bertsimas D, Gupta V, Kallus N (2018) Data-driven robust optimization.Mathematical Programming 167(2):235–292, URLhttp://dx.doi.org/10.1007/s10107-017-1125-8
-
[22]
BertsimasD,KallusN(2020)Frompredictivetoprescriptiveanalytics.ManagementScience66(3):1025–1044, URLhttp://dx.doi.org/10.1287/mnsc.2018.3253
-
[23]
Birge JR, Louveaux F (2011)Introduction to Stochastic Programming(New York, NY: Springer), 2 edition, URLhttp://dx.doi.org/10.1007/978-1-4614-0237-4
-
[24]
Blekos K, Brand D, Ceschini A, Chou CH, Li RH, Pandya K, Summer A (2024) A review on quantum approximate optimization algorithm and its variants.Physics Reports1068:1–66, URLhttp://dx.doi.org/ 10.1016/j.physrep.2024.03.002
-
[25]
BrownTB,MannB,RyderN,SubbiahM,KaplanJD,DhariwalP,NeelakantanA,ShyamP,SastryG,AskellA, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, AmodeiD(2020)Languagemodelsarefew-shotlearners.AdvancesinNeural...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2005.14165 2020
-
[26]
BuschN,CrönertT,MinnerS,RettingerM,SelB(2023)Deeplearningforcommodityprocurement: Nonlinear data-driven optimization of hedging decisions.INFORMS Journal on Optimization5(3):273–294, URLhttp: //dx.doi.org/10.1287/ijoo.2022.0086
-
[27]
Bushaj S, Büyüktahtakın İE (2024) A k-means supported reinforcement learning framework to multi- dimensionalknapsack.JournalofGlobalOptimization89(3):655–685,URLhttp://dx.doi.org/10.1007/ s10898-024-01364-6
2024
-
[28]
Bushaj S, Yin X, Beqiri A, Andrews D, Büyüktahtakın İE (2023) A simulation-deep reinforcement learning (SiRL) approach for epidemic control optimization.Annals of Operations Research328(1):245–277, URL http://dx.doi.org/10.1007/s10479-022-04926-7
-
[29]
Büyüktahtakın İE (2022) Stage-𝑡scenario dominance for risk-averse multi-stage stochastic mixed- 37 integer programs.Annals of Operations Research309:1–35, URLhttp://dx.doi.org/10.1007/ s10479-021-04388-3
2022
-
[30]
Büyüktahtakın İE, des Bordes E, Kıbış EY (2018) A new epidemics–logistics model: Insights into controlling the ebola virus disease in west africa.European Journal of Operational Research265(3):1046–1063, URL http://dx.doi.org/10.1016/j.ejor.2017.08.037
-
[31]
Büyüktahtakın İE, Feng Z, Frisvold G, Szidarovszky F, Olsson A (2011) A dynamic model of controlling invasive species.Computers & Mathematics with Applications62(9):3326–3333
2011
-
[32]
Çetinkaya İO, İ Esra Büyüktahtakın, Shojaee P, Reddy CK (2026) Discovering heuristics with large language models (LLMs) for mixed-integer programs: Single-machine scheduling.Computers & Operations Research 186:107325, URLhttp://dx.doi.org/10.1016/j.cor.2025.107325
-
[33]
Science China Mathematics67(6):1191–1262, URLhttp://dx.doi.org/10.1007/s11425-023-2293-3
Chen X, Liu J, Yin W (2024) Learning to optimize: A tutorial for continuous and mixed-integer optimization. Science China Mathematics67(6):1191–1262, URLhttp://dx.doi.org/10.1007/s11425-023-2293-3
-
[34]
Choi SJ, Cibaku E, Svirsko A, Skipper D, Büyüktahtakın İE (2026) Safety-constrained reinforcement learning fornavalwarfaresearchingwithanintelligenttarget.RefereedProceedingsofthe2026INFORMSOptimization Society Conference (IOS 2026)(Atlanta, GA)
2026
-
[35]
Choi SJ, Cooper J, Büyüktahtakın Toy E (2024) A temporal convolutional neural network (TCNN) approach to predicting capacitated lot-sizing solutions.Proceedings of the 2024 IISE Annual Conference & Expo, 1– 6 (Institute of Industrial and Systems Engineers (IISE)), URLhttp://dx.doi.org/10.21872/2024IISE_ 7151
-
[36]
ChoiSJ,JozaniK,CooperJF,BüyüktahtakınİE(2025)Learningtooptimizeatscale: Abendersdecomposition- transfORmers framework for stochastic combinatorial optimization.NeurIPS 2025 Workshop MLxOR: Mathe- maticalFoundationsandOperationalIntegrationofMachineLearningforUncertainty-AwareDecision-Making, URLhttps://openreview.net/forum?id=jVcPvWjrQ5, poster paper, pub...
2025
- [37]
-
[38]
Cooper JF, Choi SJ, Büyüktahtakın İE (2024) Toward transfORmers: Revolutionizing the solution of mixed integer programs with transformers.Proceedings of the 2024 Industrial and Systems Engineering Research Conference (ISERC)(Montreal, Canada), URLhttp://dx.doi.org/10.48550/arXiv.2402.13380, also available as arXiv:2402.13380
-
[39]
Coşgun Ö, Büyüktahtakın İE (2018) Stochastic dynamic resource allocation for hiv prevention and treatment: An approximate dynamic programming approach.Computers & Industrial Engineering118:423–439, URL http://dx.doi.org/10.1016/j.cie.2018.01.018
-
[40]
Dai JG, Gluzman M (2022) Queueing network controls via deep reinforcement learning.Stochastic Systems 12(1):30–67, URLhttp://dx.doi.org/10.1287/stsy.2021.0081
-
[41]
DelageE,YeY(2010)Distributionallyrobustoptimizationundermomentuncertaintywithapplicationtodata- driven problems.Operations Research58(3):595–612, URLhttp://dx.doi.org/10.1287/opre.1090. 0741
- [42]
-
[43]
Duchi J, Hashimoto T, Namkoong H (2023) Distributionally robust losses for latent covariate mixtures.Opera- tions Research71(2):649–664, URLhttp://dx.doi.org/10.1287/opre.2022.2363. 38
-
[44]
Duchi JC, Namkoong H (2021) Learning models with uniform performance via distributionally robust opti- mization.The Annals of Statistics49(3):1378–1406, URLhttp://dx.doi.org/10.1214/20-AOS2004
- [45]
-
[46]
Elmachtoub AN, Grigas P (2022) Smart “predict, then optimize”.Management Science68(1):9–26, URL http://dx.doi.org/10.1287/mnsc.2020.3922
-
[47]
1016/0364-0213(90)90002-E
ElmanJL(1990)Findingstructureintime.CognitiveScience14(2):179–211,URLhttp://dx.doi.org/10. 1016/0364-0213(90)90002-E
1990
-
[48]
FanM,WuY,LiaoT,CaoZ,GuoH,SartorettiG,WuG(2023)Deepreinforcementlearningforuavroutingin thepresenceofmultiplechargingstations.IEEETransactionsonVehicularTechnology72(5):5732–5746,URL http://dx.doi.org/10.1109/TVT.2022.3232607
-
[49]
Fioretto F, Mak TWK, Van Hentenryck P (2020) Predicting ac optimal power flows: Combining deep learning and lagrangian dual methods.Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 630– 637, URLhttp://dx.doi.org/10.1609/aaai.v34i01.5403
-
[50]
FischettiM,JoJ(2018)Deepneuralnetworksandmixedintegerlinearoptimization.Constraints23(3):296–309, URLhttp://dx.doi.org/10.1007/s10601-018-9285-6
-
[51]
Gal Y, Ghahramani Z (2016) Dropout as a bayesian approximation: Representing model uncertainty in deep learning.Proceedings of the 33rd International Conference on Machine Learning, volume 48 ofProceedings of Machine Learning Research, 1050–1059, URLhttp://dx.doi.org/10.48550/arXiv.1506.02142
-
[52]
Galande N, Jozani KM, Büyüktahtakın İE (2025) Artificial intelligence in supply chain optimization: A systematicreviewofmachinelearningmodels,methods,andapplications.OptimizationOnline1–66,published online December 8, 2025
2025
-
[53]
Gasse M, Chételat D, Ferroni N, Charlin L, Lodi A (2019) Exact combinatorial optimization with graph convolutional neural networks.Advances in Neural Information Processing Systems, volume 32, 15554–15566, URLhttp://dx.doi.org/10.48550/arXiv.1906.01629, neurIPS 2019
-
[54]
GautronR,MaillardOA,PreuxP,CorbeelsM,SabbadinR(2022)Reinforcementlearningforcropmanagement support: Review,prospectsandchallenges.ComputersandElectronicsinAgriculture200:107182,URLhttp: //dx.doi.org/10.1016/j.compag.2022.107182
-
[55]
Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with lstm.Neural Computation12(10):2451–2471, URLhttp://dx.doi.org/10.1162/089976600300015015
-
[56]
GijsbrechtsJ,BouteRN,VanMieghemJA,ZhangDJ(2022)Candeepreinforcementlearningimproveinventory management? performanceonlostsales,dual-sourcing,andmulti-echelonproblems.Manufacturing&Service Operations Management24(3):1349–1368, URLhttp://dx.doi.org/10.1287/msom.2021.1064
-
[57]
Goodfellow I, Bengio Y, Courville A (2016)Deep Learning(Cambridge, MA: MIT Press), ISBN 9780262035613, URLhttps://www.deeplearningbook.org/
2016
-
[58]
HamiltonWL,YingR,LeskovecJ(2017)Inductiverepresentationlearningonlargegraphs.AdvancesinNeural Information Processing Systems30, URLhttp://dx.doi.org/10.48550/arXiv.1706.02216
-
[59]
Harsha P, Jagmohan A, Kalagnanam J, Quanz B, Singhvi D (2025) Deep policy iteration with integer pro- grammingforinventorymanagement.Manufacturing&ServiceOperationsManagement27(2):369–388,URL http://dx.doi.org/10.1287/msom.2022.0617. 39
-
[60]
Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable mdps.arXiv preprint arXiv:1507.06527URLhttp://dx.doi.org/10.48550/arXiv.1507.06527
-
[61]
Hochreiter S, Schmidhuber J (1997) Long short-term memory.Neural Computation9(8):1735–1780, URL http://dx.doi.org/10.1162/neco.1997.9.8.1735
-
[62]
Multilayer feedforward networks are universal approximators , journal =
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Networks2(5):359–366, URLhttp://dx.doi.org/10.1016/0893-6080(89)90020-8
-
[63]
IvanovD(2023)Intelligentdigitaltwin(idt)forsupplychainstress-testing,resilience,andviability.International JournalofProductionEconomics263:108938,URLhttp://dx.doi.org/10.1016/j.ijpe.2023.108938
-
[64]
IvanovD,DolguiA(2020)Viabilityofintertwinedsupplynetworks: Extendingthesupplychainresiliencean- glestowardssurvivability.apositionpapermotivatedbycovid-19outbreak.InternationalJournalofProduction Research58(10):2904–2915, URLhttp://dx.doi.org/10.1080/00207543.2020.1750727
-
[65]
Jozani K, Sageer NA, Eldardiry H, Tunc S, Buyuktahtakin Toy E (2025) A multi-echelon demand-driven supply chain model for proactive optimal control of epidemics: Insights from a covid-19 study URLhttp: //dx.doi.org/10.48550/arXiv.2510.16969
-
[66]
KaelblingLP,LittmanML,CassandraAR(1998)Planningandactinginpartiallyobservablestochasticdomains. Artificial Intelligence101(1–2):99–134, URLhttp://dx.doi.org/10.1016/S0004-3702(98)00023-X
-
[67]
Kallus N, Mao X (2023) Stochastic optimization forests.Management Science69(4):1975–1994, URLhttp: //dx.doi.org/10.1287/mnsc.2022.4458
-
[68]
KendallA,GalY(2017)Whatuncertaintiesdoweneedinbayesiandeeplearningforcomputervision?Advances in Neural Information Processing Systems, volume 30, URLhttp://dx.doi.org/10.48550/arXiv.1703. 04977
-
[69]
Kerr CC, Stuart RM, Mistry D, Abeysuriya RG, Rosenfeld K, Hart GR, Nuñez RC, Cohen JA, Selvaraj P, Hagedorn B, George L, Jastrzębska M, Izzo A, Fowler G, Palmer A, Delport D, Scott N, Kelly S, Bennette CS, Wagner B, Chang ST, Vassall A, Pearson BJ, Winskill PH, Panovska-Griffiths A, Famulare M, Klein DJ (2021) Covasim: An agent-based model of COVID-19 dyn...
-
[71]
Khalil EB, Le Bodic P, Song L, Nemhauser G, Dilkina B (2016) Learning to branch in mixed inte- ger programming.Proceedings of the AAAI Conference on Artificial Intelligence30(1):724–731, URL http://dx.doi.org/10.1609/aaai.v30i1.10080
-
[72]
Proceedings of the AAAI Conference on Artificial Intelligence36(9):10219–10227, URLhttp://dx.doi
Khalil EB, Morris C, Lodi A (2022) Mip-gnn: A data-driven framework for guiding combinatorial solvers. Proceedings of the AAAI Conference on Artificial Intelligence36(9):10219–10227, URLhttp://dx.doi. org/10.1609/aaai.v36i9.21262
-
[73]
Kıbış EY, Büyüktahtakın İE (2019) Optimizing multi-modal cancer treatment under 3d spatio-temporal tumor growth.Mathematical Biosciences307:53–69, URLhttp://dx.doi.org/10.1016/j.mbs.2018.10.004
-
[74]
Kıbış EY, Büyüktahtakın İE, Haight RG, Akhundov N, Knight K, Flower CE (2021) A multistage stochastic programming approach to the optimal surveillance and control of the emerald ash borer in cities.INFORMS Journal on Computing33(2):808–834, URLhttp://dx.doi.org/10.1287/ijoc.2020.0963
-
[75]
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks.International Conference on Learning Representations, URLhttp://dx.doi.org/10.48550/arXiv.1609.02907. 40
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1609.02907 2017
-
[76]
doi.org/10.1038/s41591-018-0213-5
Komorowski M, Celi LA, Badawi O, Gordon AC, Faisal AA (2018) The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care.Nature Medicine24:1716–1720, URLhttp://dx. doi.org/10.1038/s41591-018-0213-5
-
[77]
Konda VR, Tsitsiklis JN (2000) Actor-critic algorithms.Advances in Neural Information Pro- cessing Systems12:1008–1014, URLhttps://proceedings.neurips.cc/paper/2000/hash/ 4e6cd95227cb0c280e99a195be5f6615-Abstract.html
2000
-
[78]
Kool W, van Hoof H, Welling M (2019) Attention, learn to solve routing problems!International Conference on Learning Representations, URLhttp://dx.doi.org/10.48550/arXiv.1803.08475
-
[79]
KotaryJ,FiorettoF,VanHentenryckP,WilderB(2021)End-to-endconstrainedoptimizationlearning: Asurvey. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21), 4475–4482, URLhttp://dx.doi.org/10.24963/ijcai.2021/610
-
[80]
Lakshminarayanan B, Pritzel A, Blundell C (2017) Simple and scalable predictive uncertainty estimation using deepensembles.AdvancesinNeuralInformationProcessingSystems,volume30,URLhttp://dx.doi.org/ 10.48550/arXiv.1612.01474
-
[81]
Guyon I, Luxburg Uv, Bengio S, Wallach H, FergusR,VishwanathanSVN,GarnettR,eds.,AdvancesinNeuralInformationProcessingSystems30(Curran Associates, Inc.)
Lanctot M, Zambaldi V, Gruslys A, Lazaridou A, Tuyls K, Perolat J, Silver D, Graepel T (2017) A unified game-theoretic approach to multiagent reinforcement learning. Guyon I, Luxburg Uv, Bengio S, Wallach H, FergusR,VishwanathanSVN,GarnettR,eds.,AdvancesinNeuralInformationProcessingSystems30(Curran Associates, Inc.)
2017
-
[82]
LeCunY,BengioY,HintonG(2015)Deeplearning.Nature521(7553):436–444,URLhttp://dx.doi.org/ 10.1038/nature14539
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.