Recognition: unknown
GRAIL: Autonomous Concept Grounding for Neuro-Symbolic Reinforcement Learning
Pith reviewed 2026-05-10 07:16 UTC · model grok-4.3
The pith
GRAIL enables neuro-symbolic RL agents to autonomously ground relational concepts by refining LLM-provided generic representations through environmental interaction.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GRAIL autonomously grounds relational concepts through environmental interaction. It leverages large language models to provide generic concept representations as weak supervision, then refines them to capture environment-specific semantics. This addresses both sparse reward signals and concept misalignment prevalent in underdetermined environments.
What carries the argument
The refinement process in GRAIL, where LLM generic representations are updated based on environmental interactions to match specific semantics.
If this is right
- Neuro-symbolic agents no longer need manual concept definitions for each environment.
- Better performance in sparse reward settings through grounded concepts.
- Agents can balance reward maximization with high-level goal completion.
- Adaptability to different environments without expert intervention.
Where Pith is reading between the lines
- This could make neuro-symbolic RL more practical for real-world applications where concepts vary.
- Future work might explore combining this with other forms of weak supervision beyond LLMs.
- The trade-offs observed suggest new ways to design reward functions that incorporate concept alignment.
Load-bearing premise
Generic representations supplied by LLMs can be reliably refined through environmental interaction into accurate, stable, environment-specific concept semantics without introducing persistent misalignment or harming policy learning.
What would settle it
Observing that agents using GRAIL consistently fail to learn correct environment-specific meanings for concepts, leading to poor policy performance compared to manual baselines.
Figures
read the original abstract
Neuro-symbolic Reinforcement Learning (NeSy-RL) combines symbolic reasoning with gradient-based optimization to achieve interpretable and generalizable policies. Relational concepts, such as "left of" or "close by", serve as foundational building blocks that structure how agents perceive and act. However, conventional approaches require human experts to manually define these concepts, limiting adaptability since concept semantics vary across environments. We propose GRAIL (Grounding Relational Agents through Interactive Learning), a framework that autonomously grounds relational concepts through environmental interaction. GRAIL leverages large language models (LLMs) to provide generic concept representations as weak supervision, then refines them to capture environment-specific semantics. This approach addresses both sparse reward signals and concept misalignment prevalent in underdetermined environments. Experiments on the Atari games Kangaroo, Seaquest, and Skiing demonstrate that GRAIL matches or outperforms agents with manually crafted concepts in simplified settings, and reveals informative trade-offs between reward maximization and high-level goal completion in the full environment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes GRAIL, a neuro-symbolic RL framework that uses LLMs to supply generic relational concept representations (e.g., 'left of', 'close by') as weak supervision, then refines these representations through environmental interaction to capture environment-specific semantics. It claims this addresses sparse rewards and concept misalignment in underdetermined settings, with experiments on Atari games (Kangaroo, Seaquest, Skiing) showing that GRAIL matches or outperforms agents using manually crafted concepts in simplified settings while revealing reward vs. goal-completion trade-offs in full environments.
Significance. If the refinement process and empirical results hold, GRAIL would reduce reliance on human experts for concept definition in NeSy-RL and improve adaptability across environments. The approach of leveraging LLM priors followed by interaction-based grounding is a potentially useful direction for handling concept misalignment, but the absence of quantitative results, baselines, implementation details, or stability analysis in the manuscript limits its assessed significance.
major comments (3)
- [Abstract] Abstract: the central empirical claim that 'experiments on the Atari games Kangaroo, Seaquest, and Skiing demonstrate that GRAIL matches or outperforms agents with manually crafted concepts' is unsupported, as the text provides no quantitative results, baselines, implementation details, error analysis, or performance metrics.
- [GRAIL framework description] Description of the refinement process: the claim that LLM generic representations can be reliably refined into accurate, stable, environment-specific semantics rests on an update rule that appears purely reward-driven; no anchoring loss, regularization term, semantic fidelity constraint, or convergence criterion is described that would prevent drift or persistent misalignment in sparse-reward Atari settings.
- [Experiments] Experimental validation section: without details on how concepts are represented (e.g., as embeddings or predicates), how the refinement update is implemented, or how semantic accuracy is measured post-refinement, it is impossible to assess whether the reported performance gains stem from improved concept grounding or from other factors.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our submission. The comments identify important areas for improvement in supporting our claims with details and evidence. We will perform a major revision to incorporate quantitative results, detailed methodological descriptions, and implementation specifics. We respond to each major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central empirical claim that 'experiments on the Atari games Kangaroo, Seaquest, and Skiing demonstrate that GRAIL matches or outperforms agents with manually crafted concepts' is unsupported, as the text provides no quantitative results, baselines, implementation details, error analysis, or performance metrics.
Authors: We acknowledge that the abstract's empirical claim is not supported by quantitative data in the current manuscript text. The full paper describes the experiments but does not include specific metrics or baselines in the provided sections. To address this, we will revise the abstract to either qualify the claim or include key results (e.g., average rewards or success rates), and add a results table with comparisons to manual concepts, including error analysis where available. revision: yes
-
Referee: [GRAIL framework description] Description of the refinement process: the claim that LLM generic representations can be reliably refined into accurate, stable, environment-specific semantics rests on an update rule that appears purely reward-driven; no anchoring loss, regularization term, semantic fidelity constraint, or convergence criterion is described that would prevent drift or persistent misalignment in sparse-reward Atari settings.
Authors: The referee correctly notes that the refinement process description is incomplete regarding safeguards against drift. The current manuscript presents the refinement as interaction-based but does not detail the update mechanism beyond reward signals. In the revision, we will provide the precise update rule, introduce regularization terms for semantic fidelity if not already present, and specify convergence criteria. We will also analyze the risk of misalignment in sparse-reward settings and how the LLM weak supervision helps anchor the process. revision: yes
-
Referee: [Experiments] Experimental validation section: without details on how concepts are represented (e.g., as embeddings or predicates), how the refinement update is implemented, or how semantic accuracy is measured post-refinement, it is impossible to assess whether the reported performance gains stem from improved concept grounding or from other factors.
Authors: We agree that the experimental section lacks sufficient implementation details for full assessment. We will expand this section to explain that relational concepts are represented as differentiable embeddings in the neuro-symbolic policy, provide pseudocode and equations for the refinement update rule, and describe the post-refinement semantic accuracy evaluation using environment-specific proxies or human annotations. This will allow readers to determine the contribution of the grounding process. revision: yes
Circularity Check
No circularity: framework is a descriptive method validated by experiments, not a self-referential derivation
full rationale
The paper describes GRAIL as a framework that supplies generic LLM concept representations as weak supervision and then refines them via environmental interaction to produce environment-specific semantics. No equations, derivations, or first-principles predictions are presented in the abstract or described claims that reduce by construction to fitted inputs, self-citations, or renamed known results. The central claims rest on empirical validation in Atari environments rather than any tautological loop where outputs are presupposed by the inputs. This matches the reader's assessment of no evident circular dependence.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs can supply generic concept representations that serve as useful weak supervision for relational concepts in RL environments
Reference graph
Works this paper leans on
-
[1]
Playing Atari with Deep Reinforcement Learning
V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Playing atari with deep reinforcement learning.arXiv preprint arXiv:1312.5602, 2013
work page internal anchor Pith review arXiv 2013
-
[2]
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[3]
Agent57: Outperforming the atari human benchmark
Adri `a Puigdom`enech Badia, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvitskyi, Zhaohan Daniel Guo, and Charles Blundell. Agent57: Outperforming the atari human benchmark. InProceedings of the Inter- national Conference on Machine Learning (ICML), 2020
2020
-
[4]
Bhatt, D
A. Bhatt, D. Palenicek, B. Belousov, M. Argus, A. Amiranashvili, T. Brox, and J. Peters. Crossq: Batch nor- malization in deep reinforcement learning for greater sample efficiency and simplicity. InProceedings of the International Conference on Learning Representations (ICLR), 2024
2024
-
[5]
Andrew Liu and Alla Borisyuk. A role of environmental complexity on representation learning in deep rein- forcement learning agents.arXiv preprint arXiv:2407.03436, 2024
-
[6]
Quentin Delfosse, Jannis Bl ¨uml, Fabian Tatai, Th´eo Vincent, Bjarne Gregori, Elisabeth Dillies, Jan Peters, Con- stantin Rothkopf, and Kristian Kersting. Deep reinforcement learning agents are not even close to human intel- ligence.arXiv preprint arXiv:2505.21731, 2025
-
[7]
Inter- pretable concept bottlenecks to align reinforcement learning agents
Quentin Delfosse, Sebastian Sztwiertnia, Wolfgang Stammer, Mark Rothermel, and Kristian Kersting. Inter- pretable concept bottlenecks to align reinforcement learning agents. InProceedings of the Conference on Ad- vances in Neural Information Processing Systems (NeurIPS), 2024
2024
-
[8]
Neural logic reinforcement learning
Zhengyao Jiang and Shan Luo. Neural logic reinforcement learning. InProceedings of the International Con- ference on Machine Learning (ICML), 2019
2019
-
[9]
Neuro-symbolic reinforcement learning with first-order logic
Daiki Kimura, Masaki Ono, Subhajit Chaudhury, Ryosuke Kohita, Akifumi Wachi, Don Joven Agravante, Michi- aki Tatsubori, Asim Munawar, and Alexander Gray. Neuro-symbolic reinforcement learning with first-order logic. InProceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
2021
-
[10]
GALOIS: boosting deep reinforcement learning via generalizable logic synthesis
Yushi Cao, Zhiming Li, Tianpei Yang, Hao Zhang, Yan Zheng, Yi Li, Jianye Hao, and Yang Liu. GALOIS: boosting deep reinforcement learning via generalizable logic synthesis. InProceedings of the Conference on Advances in Neural Information Processing Systems (NeurIPS), 2022
2022
-
[11]
Interpretable and explainable logical policies via neurally guided symbolic abstraction
Quentin Delfosse, Hikaru Shindo, Devendra Dhami, and Kristian Kersting. Interpretable and explainable logical policies via neurally guided symbolic abstraction. InProceedings of the Conference on Advances in Neural Information Processing Systems (NeurIPS), 2023
2023
-
[12]
Shao-Hua Sun, Te-Lin Wu, and Joseph J. Lim. Program guided agent. InProceedings of the International Conference on Learning Representations (ICLR), 2020
2020
-
[13]
Programmati- cally interpretable reinforcement learning
Abhinav Verma, Vijayaraghavan Murali, Rishabh Singh, Pushmeet Kohli, and Swarat Chaudhuri. Programmati- cally interpretable reinforcement learning. InProceedings of the International Conference on Machine Learning (ICML), 2018. 19
2018
-
[14]
SDRL: interpretable and data-efficient deep rein- forcement learning leveraging symbolic planning
Daoming Lyu, Fangkai Yang, Bo Liu, and Steven Gustafson. SDRL: interpretable and data-efficient deep rein- forcement learning leveraging symbolic planning. InProceedings of the AAAI Conference on Artificial Intelli- gence (AAAI), 2019
2019
-
[15]
Com- bining reinforcement learning and constraint programming for combinatorial optimization
Quentin Cappart, Thierry Moisan, Louis-Martin Rousseau, Isabeau Pr ´emont-Schwarz, and Andre A Cire. Com- bining reinforcement learning and constraint programming for combinatorial optimization. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2021
2021
-
[16]
Hector Kohler, Quentin Delfosse, Riad Akrour, Kristian Kersting, and Philippe Preux. Interpretable and editable programmatic tree policies for reinforcement learning.arXiv preprint arXiv:2405.14956, 2024
-
[17]
Bruner, Jacqueline J
Jerome S. Bruner, Jacqueline J. Goodnow, and George A. Austin.A Study of Thinking. John Wiley and Sons, 1956
1956
-
[18]
Natural categories.Cognitive psychology, 4(3):328–350, 1973
Eleanor H Rosch. Natural categories.Cognitive psychology, 4(3):328–350, 1973
1973
-
[19]
James Archer
E. James Archer. The psychological nature of concepts. InAnalyses of Concept Learning, pages 37–49. Elsevier, 1966
1966
-
[20]
Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B. Tenenbaum, and Jiajun Wu. The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision.arXiv preprint arXiv:1904.12584, 2019
work page Pith review arXiv 1904
-
[21]
What’s left? concept grounding with logic-enhanced foundation models
Joy Hsu, Jiayuan Mao, Josh Tenenbaum, and Jiajun Wu. What’s left? concept grounding with logic-enhanced foundation models. InProceedings of the Conference on Advances in Neural Information Processing Systems (NeurIPS), 2023
2023
-
[22]
Jiayuan Mao, Joshua B. Tenenbaum, and Jiajun Wu. Neuro-symbolic concepts.arXiv preprint arXiv:2505.06191, 2025
-
[23]
Tenenbaum
Tom Silver, Rohan Chitnis, Nishanth Kumar, Willie McClinton, Tom ´as Lozano-P ´erez, Leslie Pack Kaelbling, and Joshua B. Tenenbaum. Predicate invention for bilevel planning. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2023
2023
-
[24]
George A. V ouros. Explainable deep reinforcement learning: State of the art and challenges.ACM Computing Surveys, 55(5):1–39, 2022
2022
-
[25]
Blendrl: A framework for merging symbolic and neural policy learning
Hikaru Shindo, Quentin Delfosse, Devendra Singh Dhami, and Kristian Kersting. Blendrl: A framework for merging symbolic and neural policy learning. InProceedings of the International Conference on Learning Representations (ICLR), 2025
2025
-
[26]
Learning explanatory rules from noisy data.Journal of Artificial Intelligence Research, 61:1–64, 2018
Richard Evans and Edward Grefenstette. Learning explanatory rules from noisy data.Journal of Artificial Intelligence Research, 61:1–64, 2018
2018
-
[27]
Hikaru Shindo, Viktor Pfanschilling, Devendra Singh Dhami, and Kristian Kersting.αILP: thinking visual scenes as differentiable logic programs.Machine Learning (MLJ), 2023
2023
-
[28]
Lloyd.Foundations of Logic Programming
John W. Lloyd.Foundations of Logic Programming. Springer, Berlin, Heidelberg, 1984
1984
-
[29]
MIT Press, 2001
Raymond Reiter.Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Sys- tems. MIT Press, 2001
2001
-
[30]
Differentiable inductive logic programming for struc- tured examples
Hikaru Shindo, Masaaki Nishino, and Akihiro Yamamoto. Differentiable inductive logic programming for struc- tured examples. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2021
2021
-
[31]
Relational reinforcement learning.Machine Learning (MLJ), 2001
Saso Dzeroski, Luc De Raedt, and Kurt Driessens. Relational reinforcement learning.Machine Learning (MLJ), 2001
2001
-
[32]
Bellman goes relational
Kristian Kersting, Martijn van Otterlo, and Luc DeRaedt. Bellman goes relational. InProceedings of the Inter- national Conference on Machine Learning (ICML), 2004
2004
-
[33]
Non-parametric policy gradients: a unified treatment of propositional and relational domains
Kristian Kersting and Kurt Driessens. Non-parametric policy gradients: a unified treatment of propositional and relational domains. InProceedings of the International Conference on Machine Learning (ICML), 2008
2008
-
[34]
Exploration in relational domains for model-based rein- forcement learning.Journal of Machine Learning Research (JMLR), 2012
Tobias Lang, Marc Toussaint, and Kristian Kersting. Exploration in relational domains for model-based rein- forcement learning.Journal of Machine Learning Research (JMLR), 2012
2012
-
[35]
Deep explainable relational reinforcement learning: A neuro-symbolic approach
Rishi Hazra and Luc De Raedt. Deep explainable relational reinforcement learning: A neuro-symbolic approach. InMachine Learning and Knowledge Discovery in Databases: Research Track - European Conference (ECML PKDD), 2023
2023
-
[36]
Neurosymbolic reinforcement learning and planning: A survey.IEEE Transactions on Artificial Intelligence, 5(5):1939–1953, 2023
Kamal Acharya, Waleed Raza, Carlos Dourado, Alvaro Velasquez, and Houbing Herbert Song. Neurosymbolic reinforcement learning and planning: A survey.IEEE Transactions on Artificial Intelligence, 5(5):1939–1953, 2023. 20
1939
-
[37]
Human-allied relational reinforcement learning
Fateme Golivand Darvishvand, Hikaru Shindo, Sahil Sidheekh, Kristian Kersting, and Sriraam Natarajan. Human-allied relational reinforcement learning. In25th Annual Conference on Advances in Cognitive Systems (ACS), 2025
2025
-
[38]
Learning differentiable logic programs for abstract visual reasoning.Machine Learning (MLJ), 2024
Hikaru Shindo, Viktor Pfanschilling, Devendra Singh Dhami, and Kristian Kersting. Learning differentiable logic programs for abstract visual reasoning.Machine Learning (MLJ), 2024
2024
-
[39]
Concept bottleneck models
Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. InProceedings of the International Conference on Machine Learning (ICML), 2020
2020
-
[40]
Concept embedding models: Beyond the accuracy-explainability trade-off
Mateo Espinosa Zarlenga, Pietro Barbiero, Gabriele Ciravegna, Giuseppe Marra, Francesco Giannini, Michelan- gelo Diligenti, Zohreh Shams, Frederic Precioso, Stefano Melacci, Adrian Weller, Pietro Li´o, and Mateja Jamnik. Concept embedding models: Beyond the accuracy-explainability trade-off. InProceedings of the Conference on Advances in Neural Informatio...
2022
-
[41]
Right for the right concept: Revising neuro- symbolic concepts by interacting with their explanations
Wolfgang Stammer, Patrick Schramowski, and Kristian Kersting. Right for the right concept: Revising neuro- symbolic concepts by interacting with their explanations. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), 2021
2021
-
[42]
Object-centric concept-bottlenecks
David Steinmann, Wolfgang Stammer, Antonia W¨ust, and Kristian Kersting. Object-centric concept-bottlenecks. InProceedings of the Conference on Advances in Neural Information Processing Systems (NeurIPS), 2025
2025
-
[43]
Neuro-symbolic predicate in- vention: Learning relational concepts from visual scenes.Neurosymbolic Artificial Intelligence Journal (NAIJ), 2025
Jingyuan Sha, Hikaru Shindo, Kristian Kersting, and Devendra Singh Dhami. Neuro-symbolic predicate in- vention: Learning relational concepts from visual scenes.Neurosymbolic Artificial Intelligence Journal (NAIJ), 2025
2025
-
[44]
Gestalt vision: A dataset for evaluating gestalt principles in visual perception
Jingyuan Sha, Hikaru Shindo, Kristian Kersting, and Devendra Singh Dhami. Gestalt vision: A dataset for evaluating gestalt principles in visual perception. In19th International Conference on Neurosymbolic Learning and Reasoning (NeSy), 2025
2025
-
[45]
Synthesizing visual concepts as vision-language programs
Antonia W ¨ust, Wolfgang Stammer, Hikaru Shindo, Lukas Helff, Devendra Singh Dhami, and Kristian Kersting. Synthesizing visual concepts as vision-language programs. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
2026
-
[46]
Deisam: Segment anything with deictic prompting
Hikaru Shindo, Manuel Brack, Gopika Sudhakaran, Devendra Singh Dhami, Patrick Schramowski, and Kristian Kersting. Deisam: Segment anything with deictic prompting. InProceedings of the Conference on Advances in Neural Information Processing Systems (NeurIPS), 2024
2024
-
[47]
V-lol: A diagnostic dataset for visual logical learning.arXiv preprint arXiv:2306.07743, 2023
Lukas Helff, Wolfgang Stammer, Hikaru Shindo, Devendra Singh Dhami, and Kristian Kersting. V-lol: A diagnostic dataset for visual logical learning.arXiv preprint arXiv:2306.07743, 2023
-
[48]
James F. Allen. Maintaining knowledge about temporal intervals.Communications of the ACM, 26(11):832–843, 1983
1983
-
[49]
Object-centric learning with slot attention
Francesco Locatello, Dirk Weissenborn, Thomas Unterthiner, Aravindh Mahendran, Georg Heigold, Jakob Uszkoreit, Alexey Dosovitskiy, and Thomas Kipf. Object-centric learning with slot attention. InProceedings of the Conference on Advances in Neural Information Processing Systems (NeurIPS), 2020
2020
-
[50]
SPACE: unsupervised object-oriented scene representation via spatial attention and decomposition
Zhixuan Lin, Yi-Fu Wu, Skand Vishwanath Peri, Weihao Sun, Gautam Singh, Fei Deng, Jindong Jiang, and Sungjin Ahn. SPACE: unsupervised object-oriented scene representation via spatial attention and decomposition. InProceedings of the International Conference on Learning Representations (ICLR), 2020
2020
-
[51]
Conditional object-centric learning from video
Thomas Kipf, Gamaleldin Fathy Elsayed, Aravindh Mahendran, Austin Stone, Sara Sabour, Georg Heigold, Rico Jonschkowski, Alexey Dosovitskiy, and Klaus Greff. Conditional object-centric learning from video. In Proceedings of the International Conference on Learning Representations (ICLR), 2022
2022
-
[52]
Boosting object representation learning via motion and object continuity
Quentin Delfosse, Wolfgang Stammer, Thomas Rothenbacher, Dwarak Vittal, and Kristian Kersting. Boosting object representation learning via motion and object continuity. InMachine Learning and Knowledge Discovery in Databases: Research Track - European Conference (ECML PKDD), 2023
2023
-
[53]
Entity-centric reinforcement learning for object manipulation from pixels
Dan Haramati, Tal Daniel, and Aviv Tamar. Entity-centric reinforcement learning for object manipulation from pixels. InProceedings of the International Conference on Learning Representations (ICLR), 2024
2024
-
[54]
Slot-based object-centric reinforcement learning algorithm
Chao Chen, Fei Wang, and Xinyao Wang. Slot-based object-centric reinforcement learning algorithm. InInter- national Conference on CYBER Technology in Automation, Control, and Intelligent Systems, 2024
2024
-
[55]
Nils Grandien, Quentin Delfosse, and Kristian Kersting. Interpretable end-to-end neurosymbolic reinforcement learning agents.arXiv preprint arXiv:2410.14371, 2024
-
[56]
Sold: Slot object-centric latent dy- namics models for relational manipulation learning from pixels
Malte Mosbach, Jan Niklas Ewertz, Angel Villar-Corrales, and Sven Behnke. Sold: Slot object-centric latent dy- namics models for relational manipulation learning from pixels. InProceedings of the International Conference on Machine Learning (ICML), 2025. 21
2025
-
[57]
Better decisions through the right causal world model.arXiv preprint arXiv:2504.07257, 2025
Elisabeth Dillies, Quentin Delfosse, Jannis Bl ¨uml, Raban Emunds, Florian Peter Busch, and Kristian Kersting. Better decisions through the right causal world model.arXiv preprint arXiv:2504.07257, 2025
-
[58]
Object-centric world models for causality-aware reinforcement learn- ing
Yosuke Nishimoto and Takashi Matsubara. Object-centric world models for causality-aware reinforcement learn- ing. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2026
2026
-
[59]
Deep reinforcement learning via object-centric attention.arXiv preprint arXiv:2504.03024, 2025
Jannis Bl ¨uml, Cedric Derstroff, Bjarne Gregori, Elisabeth Dillies, Quentin Delfosse, and Kristian Kersting. Deep reinforcement learning via object-centric attention.arXiv preprint arXiv:2504.03024, 2025
-
[60]
Learning interactive world model for object-centric reinforcement learning
Fan Feng, Phillip Lippe, and Sara Magliacane. Learning interactive world model for object-centric reinforcement learning. InProceedings of the Conference on Advances in Neural Information Processing Systems (NeurIPS), 2025
2025
-
[61]
Quentin Delfosse, Jannis Bl ¨uml, Bjarne Gregori, Sebastian Sztwiertnia, and Kristian Kersting. Ocatari: Object- centric atari 2600 reinforcement learning environments.arXiv preprint arXiv:2306.08649, 2023
-
[62]
End-to-end neuro-symbolic reinforcement learning with textual explanations
Lirui Luo, Guoxi Zhang, Hongming Xu, Yaodong Yang, Cong Fang, and Qing Li. End-to-end neuro-symbolic reinforcement learning with textual explanations. InProceedings of the International Conference on Machine Learning (ICML), 2024
2024
-
[63]
Objects matter: Object-centric world models improve reinforcement learning in visually complex environments
Weipu Zhang, Adam Jelley, Trevor McInroe, and Amos Storkey. Objects matter: Object-centric world models improve reinforcement learning in visually complex environments. InReinforcement Learning and Video Games Workshop@ RLC, 2025
2025
-
[64]
Rusu, Joel Veness, Marc G
V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin A. Riedmiller, Andreas Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human- level control through deep reinforcement...
2015
-
[65]
You only look once: Unified, real-time object detection
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016
2016
-
[66]
Xu Zhao, Wenchao Ding, Yongqi An, Yinglong Du, Tao Yu, Min Li, Ming Tang, and Jinqiao Wang. Fast segment anything.arXiv preprint arXiv:2306.12156, 2023
-
[67]
Self-supervised visual reinforcement learning with object-centric representations
Andrii Zadaianchuk, Maximilian Seitzer, and Georg Martius. Self-supervised visual reinforcement learning with object-centric representations. InProceedings of the International Conference on Learning Representations (ICLR), 2021
2021
-
[68]
Semantic tracklets: An object- centric representation for visual multi-agent reinforcement learning
Iou-Jen Liu, Zhongzheng Ren, Raymond A Yeh, and Alexander G Schwing. Semantic tracklets: An object- centric representation for visual multi-agent reinforcement learning. InProceedings of the IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS), 2021
2021
-
[69]
An investigation into pre-training object-centric rep- resentations for reinforcement learning
Jaesik Yoon, Yi-Fu Wu, Heechul Bae, and Sungjin Ahn. An investigation into pre-training object-centric rep- resentations for reinforcement learning. InProceedings of the International Conference on Machine Learning (ICML), 2023
2023
-
[70]
Pix2code: Learning to compose neural visual concepts as programs
Antonia W ¨ust, Wolfgang Stammer, Quentin Delfosse, Devendra Singh Dhami, and Kristian Kersting. Pix2code: Learning to compose neural visual concepts as programs. InProceedings of Conference on Uncertainty in Artificial Intelligence (UAI), 2024
2024
-
[71]
Neural concept binder
Wolfgang Stammer, Antonia W ¨ust, David Steinmann, and Kristian Kersting. Neural concept binder. InProceed- ings of the Conference on Advances in Neural Information Processing Systems (NeurIPS), 2024
2024
-
[72]
Rainbow: Combining improvements in deep reinforcement learning
Matteo Hessel, Joseph Modayil, Hado Van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver. Rainbow: Combining improvements in deep reinforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2018
2018
-
[73]
Decision transformer: Reinforcement learning via sequence modeling
Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Misha Laskin, Pieter Abbeel, Aravind Srinivas, and Igor Mordatch. Decision transformer: Reinforcement learning via sequence modeling. InProceed- ings of the Conference on Advances in Neural Information Processing Systems (NeurIPS), 2021
2021
-
[74]
Stabilizing transformers for reinforcement learn- ing
Emilio Parisotto, Francis Song, Jack Rae, Razvan Pascanu, Caglar Gulcehre, Siddhant Jayakumar, Max Jader- berg, Raphael Lopez Kaufman, Aidan Clark, Seb Noury, et al. Stabilizing transformers for reinforcement learn- ing. InProceedings of the International Conference on Machine Learning (ICML), 2020
2020
-
[75]
Interactive disentanglement: Learning concepts by interacting with their prototype representations
Wolfgang Stammer, Marius Memmel, Patrick Schramowski, and Kristian Kersting. Interactive disentanglement: Learning concepts by interacting with their prototype representations. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), 2022. 22
2022
-
[76]
Human-in-the- loop or ai-in-the-loop? automate or collaborate? InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2025
Sriraam Natarajan, Saurabh Mathur, Sahil Sidheekh, Wolfgang Stammer, and Kristian Kersting. Human-in-the- loop or ai-in-the-loop? automate or collaborate? InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2025
2025
-
[77]
Claude 4 sonnet.https://claude.ai/, 2025
Anthropic. Claude 4 sonnet.https://claude.ai/, 2025. [Large language model]
2025
-
[78]
Chatgpt-4o.https://openai.com/research/, 2025
OpenAI. Chatgpt-4o.https://openai.com/research/, 2025. [Large language model]
2025
-
[79]
Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling
Marc G. Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents.Journal of Artificial Intelligence Research (JAIR), 2013
2013
-
[80]
Hackatari: Atari learning environments for robust and continual reinforcement learning
Quentin Delfosse, Jannis Bl ¨uml, Bjarne Gregori, and Kristian Kersting. Hackatari: Atari learning environments for robust and continual reinforcement learning. InWorking Notes of the RLC 2024 Workshop on Interpretable Policies in Reinforcement Learning, 2024
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.