Task-specific Subnetwork Discovery in Reinforcement Learning for Autonomous Underwater Navigation

Frank Kirchner; Mariela De Lucas Alvarez; Melvin Laux; Rebecca Adam; Yi-Ling Liu

arxiv: 2604.21640 · v1 · submitted 2026-04-23 · 💻 cs.LG · cs.AI· cs.RO

Task-specific Subnetwork Discovery in Reinforcement Learning for Autonomous Underwater Navigation

Yi-Ling Liu , Melvin Laux , Mariela De Lucas Alvarez , Frank Kirchner , Rebecca Adam This is my paper

Pith reviewed 2026-05-09 22:45 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.RO

keywords multi-task reinforcement learningsubnetwork discoveryunderwater navigationcontextual RLtask-specific weightsautonomous underwater vehiclesexplainable AI

0 comments

The pith

A multi-task RL network for underwater navigation differentiates tasks using only 1.5% of its weights, with 85% of those linking context inputs to the first hidden layer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the internal structure of a pretrained multi-task reinforcement learning policy for autonomous underwater vehicles navigating toward different species in the HoloOcean simulator. By identifying task-specific subnetworks, the authors show that related tasks share most of the network while a tiny fraction of weights handles differentiation. This small set is overwhelmingly concentrated in connections from explicit context variables at the input layer. The result suggests that context injection can produce highly localized specialization without disrupting shared representations. Such structure could support targeted edits to the policy for new tasks or environments.

Core claim

In a contextual multi-task reinforcement learning setting with related tasks, the network uses only about 1.5% of its weights to differentiate between tasks. Of these, approximately 85% connect the context-variable nodes in the input layer to the next hidden layer.

What carries the argument

Task-specific subnetwork identification procedure that extracts the minimal weight set responsible for distinguishing navigation targets.

If this is right

Shared representations across related underwater navigation tasks can remain intact while only a sparse set of connections is updated for each new target species.
Context variables should be placed at the input layer and given direct access to early hidden layers to maximize efficient task specialization.
Model editing for continual learning becomes practical by modifying or freezing only the small task-specific subnetworks.
Transfer to new but related environments can reuse the bulk of the network and retrain only the 1.5% specialized weights.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the same sparsity pattern holds in other domains with explicit context, multi-task policies could be pruned to a small active subnetwork at inference time for computational savings.
One could test whether dynamically routing inputs through only the relevant subnetwork (selected by context) yields the same performance as the full network.
The finding raises the question of whether similar subnetwork sparsity emerges when context is learned implicitly rather than supplied explicitly.
Extending the approach to real ocean data would reveal whether simulator-derived subnetworks remain valid under sensor noise and unmodeled dynamics.

Load-bearing premise

The subnetwork identification procedure accurately isolates causal task-specific components rather than training artifacts or spurious correlations in the HoloOcean simulator data.

What would settle it

Remove the identified 1.5% task-specific weights, retrain or evaluate the remaining network on the same set of navigation tasks, and check whether task differentiation accuracy collapses or stays near original levels.

Figures

Figures reproduced from arXiv: 2604.21640 by Frank Kirchner, Mariela De Lucas Alvarez, Melvin Laux, Rebecca Adam, Yi-Ling Liu.

**Figure 2.** Figure 2: Task-specific subnetworks specialized for navigation to [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: In the Minigrid environment, the agent represented by [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: In the HoloOcean environment, the yellow AUV [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 6.** Figure 6: The analysis of shared and task-specific weights across [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

read the original abstract

Autonomous underwater vehicles are required to perform multiple tasks adaptively and in an explainable manner under dynamic, uncertain conditions and limited sensing, challenges that classical controllers struggle to address. This demands robust, generalizable, and inherently interpretable control policies for reliable long-term monitoring. Reinforcement learning, particularly multi-task RL, overcomes these limitations by leveraging shared representations to enable efficient adaptation across tasks and environments. However, while such policies show promising results in simulation and controlled experiments, they yet remain opaque and offer limited insight into the agent's internal decision-making, creating gaps in transparency, trust, and safety that hinder real-world deployment. The internal policy structure and task-specific specialization remain poorly understood. To address these gaps, we analyze the internal structure of a pretrained multi-task reinforcement learning network in the HoloOcean simulator for underwater navigation by identifying and comparing task-specific subnetworks responsible for navigating toward different species. We find that in a contextual multi-task reinforcement learning setting with related tasks, the network uses only about 1.5% of its weights to differentiate between tasks. Of these, approximately 85% connect the context-variable nodes in the input layer to the next hidden layer, highlighting the importance of context variables in such settings. Our approach provides insights into shared and specialized network components, useful for efficient model editing, transfer learning, and continual learning for underwater monitoring through a contextual multi-task reinforcement learning method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper finds that task differentiation in their multi-task RL for underwater navigation uses only 1.5% of weights, mostly context connections, but the subnetwork method lacks enough validation to rule out artifacts.

read the letter

The main thing to know is that in a contextual multi-task RL policy trained for AUV navigation in HoloOcean, the network appears to use just 1.5% of its weights to handle differences across related tasks, with roughly 85% of those weights linking the context input nodes to the first hidden layer. This gives a concrete split between shared and specialized capacity for this domain. They apply subnetwork dissection to a practical robotics setting and show how little extra capacity is needed when tasks share structure, which could inform model editing or continual learning for underwater monitoring. That empirical breakdown on a real simulator task is the useful part. The soft spot is the identification procedure itself. The abstract states the percentages without laying out the algorithm, any ablation tests, sensitivity checks, or controls for whether removing those weights selectively hurts task performance while leaving others intact. The stress-test point is fair: without causal validation, the result could simply trace back to how context is explicitly fed into the network or to simulator-specific patterns rather than genuine learned specialization. If the method is purely post-hoc scoring, the 1.5% figure is harder to trust as a general insight. This paper is for people working on interpretable multi-task RL in robotics or safety-critical control. A reader focused on AUVs or capacity analysis in RL would get value from the numbers, even if the scope stays narrow. It deserves peer review because the core observation is new for this setting and the approach is straightforward, though the methods section will need expansion to make the claims solid.

Referee Report

3 major / 1 minor

Summary. The paper analyzes the internal structure of a pretrained contextual multi-task reinforcement learning policy for autonomous underwater vehicle navigation in the HoloOcean simulator. It identifies task-specific subnetworks responsible for navigating toward different species and reports that only about 1.5% of the network weights differentiate between tasks, with approximately 85% of those weights being connections from explicit context-variable nodes in the input layer to the first hidden layer. The work aims to improve explainability for model editing, transfer learning, and continual learning in underwater monitoring.

Significance. If the subnetwork discovery procedure is shown to be robust and the reported percentages reflect genuine learned specialization rather than input encoding artifacts or simulator-specific correlations, the result would offer concrete, quantitative insight into how contextual multi-task RL networks allocate capacity across related tasks. This could directly support more efficient policy editing and adaptation in resource-constrained AUV applications, addressing a recognized barrier to deployment of opaque RL controllers.

major comments (3)

[Abstract] Abstract: The headline quantitative claims (1.5% of weights differentiate tasks; 85% of those connect context nodes to the first hidden layer) are stated without any description of the subnetwork discovery algorithm, importance scoring method, statistical controls, or sensitivity checks. This absence makes it impossible to determine whether the percentages are robust or arise from post-hoc choices.
[Abstract] The claim that context connections dominate the task-specific subnetwork is at risk of being tautological given the explicit context-variable encoding in the input layer; no control experiment (e.g., comparison to a non-contextual baseline or ablation of context inputs while measuring task differentiation) is described to establish that the 85% figure reflects learned task structure rather than architectural input structure.
[Abstract] No causal validation of the identified subnetworks is provided, such as targeted ablation of the discovered task-specific weights (while freezing shared weights) and measurement of selective performance degradation on individual navigation tasks. Without such checks, the percentages may reflect simulator correlations or spurious input patterns rather than causal task differentiation.

minor comments (1)

[Abstract] The abstract refers to the HoloOcean simulator and 'different species' without providing a reference, brief description of the simulation environment, or the precise task definitions.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and will revise the paper accordingly to improve methodological transparency and validation.

read point-by-point responses

Referee: [Abstract] Abstract: The headline quantitative claims (1.5% of weights differentiate tasks; 85% of those connect context nodes to the first hidden layer) are stated without any description of the subnetwork discovery algorithm, importance scoring method, statistical controls, or sensitivity checks. This absence makes it impossible to determine whether the percentages are robust or arise from post-hoc choices.

Authors: We agree that the abstract omits critical methodological details. In the revision we will expand the abstract with a concise description of the subnetwork discovery procedure and importance scoring. We will also add a dedicated methods subsection that fully specifies the algorithm, the importance metric, statistical controls (e.g., multiple random seeds and threshold sensitivity), and robustness checks. revision: yes
Referee: [Abstract] The claim that context connections dominate the task-specific subnetwork is at risk of being tautological given the explicit context-variable encoding in the input layer; no control experiment (e.g., comparison to a non-contextual baseline or ablation of context inputs while measuring task differentiation) is described to establish that the 85% figure reflects learned task structure rather than architectural input structure.

Authors: The concern is valid: the explicit context inputs make the concentration of differentiating weights in the first-layer connections partly architectural. However, the discovery procedure still isolates only those weights whose values differ meaningfully across tasks. To address the point directly we will add a control comparison in the revision: we will train and analyze an otherwise identical non-contextual multi-task baseline on the same navigation tasks and show that the contextual model exhibits substantially higher task-specific weight concentration in the context-to-hidden connections. revision: partial
Referee: [Abstract] No causal validation of the identified subnetworks is provided, such as targeted ablation of the discovered task-specific weights (while freezing shared weights) and measurement of selective performance degradation on individual navigation tasks. Without such checks, the percentages may reflect simulator correlations or spurious input patterns rather than causal task differentiation.

Authors: We agree that causal evidence is needed. In the revised manuscript we will report targeted ablation experiments: for each task we will zero the discovered task-specific weights while freezing all shared weights, then quantify the selective drop in success rate on that task versus the others. These results will be added to the experimental section to demonstrate that the identified subnetworks are causally responsible for task differentiation. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical subnetwork analysis on fixed pretrained network

full rationale

The paper reports an observational analysis of task-specific subnetworks within a pretrained contextual multi-task RL policy for underwater navigation in HoloOcean. The central quantitative claims (approximately 1.5% of weights differentiate tasks, with 85% of those connecting context nodes to the first hidden layer) are presented as direct measurements from the fixed network rather than predictions, derivations, or fitted parameters that loop back to the identification procedure. No equations, self-citations, or ansatzes are invoked to justify the percentages; the method is applied post-training to an existing model. This is a standard empirical inspection with no reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical model or new theoretical entities are introduced; the work is an empirical dissection of an existing trained network.

pith-pipeline@v0.9.0 · 5561 in / 1195 out tokens · 119884 ms · 2026-05-09T22:45:28.376412+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 10 canonical work pages · 3 internal anchors

[1]

Discovering Knowledge-Critical Subnetworks in Pretrained Language Models

Deniz Bayazit et al. “Discovering Knowledge-Critical Subnetworks in Pretrained Language Models”. In:Pro- ceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. EMNLP 2024. Ed. by Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen. Miami, Florida, USA: Association for Computational Linguistics, Nov. 2024, pp. 6549–6583.DOI: 10....

2024
[2]

Yoshua Bengio, Nicholas L ´eonard, and Aaron Courville.Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. Aug. 15, 2013.DOI: 10 . 48550 / arXiv . 1308 . 3432. arXiv: 1308 . 3432[cs].URL: http://arxiv.org/abs/1308.3432

work page internal anchor Pith review arXiv 2013
[3]

Mechanistic In- terpretability for AI Safety - A Review

Leonard Bereska and Stratis Gavves. “Mechanistic In- terpretability for AI Safety - A Review”. In:Transac- tions on Machine Learning Research(Apr. 27, 2024). ISSN: 2835-8856.URL: https://openreview.net/forum? id=ePUVetPKu6

2024
[4]

Interpreting Emergent Planning in Model-Free Reinforcement Learning

Thomas Bush et al. “Interpreting Emergent Planning in Model-Free Reinforcement Learning”. In:The Thir- teenth International Conference on Learning Represen- tations. 2025.URL: https://openreview.net/forum?id= DzGe40glxs

2025
[5]

Minigrid and miniworld: Modular and customizable reinforcement learning environments for goal-oriented tasks, 2023

Maxime Chevalier-Boisvert et al. “Minigrid & Mini- world: Modular & Customizable Reinforcement Learn- ing Environments for Goal-Oriented Tasks”. In:CoRR abs/2306.13831 (2023)

work page arXiv 2023
[6]

Learning Phrase Represen- tations Using RNN Encoder–Decoder for Statistical Machine Translation

Kyunghyun Cho et al. “Learning Phrase Represen- tations Using RNN Encoder–Decoder for Statistical Machine Translation”. In:Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). EMNLP 2014. Ed. by Alessan- dro Moschitti, Bo Pang, and Walter Daelemans. Doha, Qatar: Association for Computational Linguistics, Oct. 201...

work page doi:10.3115/v1/d14-1179.url: 2014
[7]

Recent Advances in AI for Nav- igation and Control of Underwater Robots

Leif Christensen et al. “Recent Advances in AI for Nav- igation and Control of Underwater Robots”. In:Current Robotics Reports3.4 (Dec. 1, 2022), pp. 165–175.ISSN: 2662-4087.DOI: 10.1007/s43154-022-00088-3.URL: https://doi.org/10.1007/s43154-022-00088-3

work page doi:10.1007/s43154-022-00088-3.url: 2022
[8]

Towards Automated Circuit Dis- covery for Mechanistic Interpretability

Arthur Conmy et al. “Towards Automated Circuit Dis- covery for Mechanistic Interpretability”. In:Advances in Neural Information Processing Systems. Ed. by A. Oh et al. V ol. 36. Curran Associates, Inc., 2023, pp. 16318– 16352.URL: https://proceedings.neurips.cc/paper files/ paper/2023/file/34e1dbe95d34d7ebaf99b9bcaeb5b2be- Paper-Conference.pdf

2023
[9]

Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks

R ´obert Csord ´as, Sjoerd van Steenkiste, and J ¨urgen Schmidhuber. “Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks”. In:International Conference on Learning Rep- resentations. 2021.URL: https://openreview.net/forum? id=7uVcpu-gMD

2021
[10]

Sharing Knowledge in Multi- Task Deep Reinforcement Learning

Carlo D’Eramo et al. “Sharing Knowledge in Multi- Task Deep Reinforcement Learning”. In: International Conference on Learning Representations. Sept. 23, 2019.URL: https : / / openreview . net / forum ? id = rkgpv2VFvr

2019
[11]

Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural Networks by Pruning A Randomly Weighted Network

James Diffenderfer and Bhavya Kailkhura. “Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural Networks by Pruning A Randomly Weighted Network”. In:International Conference on Learning Representations. 2021.URL: https : / / openreview. net / forum?id=U mat0b9iv

2021
[12]

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

Jonathan Frankle and Michael Carbin. “The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks”. In:International Conference on Learning Representations. 2019.URL: https : / / openreview. net / forum?id=rJl-b3RcF7

2019
[13]

Danijar Hafner et al.Mastering Diverse Domains through World Models. Apr. 17, 2024.DOI: 10.48550/ arXiv.2301.04104. arXiv: 2301.04104[cs].URL: http: //arxiv.org/abs/2301.04104

work page internal anchor Pith review arXiv 2024
[14]

Assaf Hallak, Dotan Di Castro, and Shie Mannor.Con- textual Markov Decision Processes. Tech. rep. arXiv, Feb. 2015.DOI: 10.48550/arXiv.1502.02259. arXiv: 1502.02259

work page Pith review doi:10.48550/arxiv.1502.02259 2015
[15]

Deep Reinforcement Learning with Double Q- Learning

Hado van Hasselt, Arthur Guez, and David Sil- ver. “Deep Reinforcement Learning with Double Q- Learning”. In:Proceedings of the Thirtieth AAAI Con- ference on Artificial Intelligence. AAAI’16. Phoenix, Arizona: AAAI Press, Feb. 2016, pp. 2094–2100

2016
[16]

Categori- cal Reparameterization with Gumbel-Softmax

Eric Jang, Shixiang Gu, and Ben Poole. “Categori- cal Reparameterization with Gumbel-Softmax”. In:In- ternational Conference on Learning Representations. 2017.URL: https : / / openreview . net / forum ? id = rkE3y85ee

2017
[17]

Melvin Laux et al.Contextual Multi-Task Reinforce- ment Learning for Autonomous Reef Monitoring. 2026. arXiv: 2604.12645[cs.RO].URL: https://arxiv.org/ abs/2604.12645

work page internal anchor Pith review Pith/arXiv arXiv 2026
[18]

Break It Down: Evidence for Structural Composition- ality in Neural Networks

Michael A. Lepori, Thomas Serre, and Ellie Pavlick. “Break It Down: Evidence for Structural Composition- ality in Neural Networks”. In:Thirty-seventh Confer- ence on Neural Information Processing Systems. 2023. URL: https://openreview.net/forum?id=rwbzMiuFQl

2023
[19]

Proving the Lottery Ticket Hypoth- esis: Pruning is All You Need

Eran Malach et al. “Proving the Lottery Ticket Hypoth- esis: Pruning is All You Need”. In:Proceedings of the 37th International Conference on Machine Learning. Ed. by Hal Daum ´e III and Aarti Singh. V ol. 119. Pro- ceedings of Machine Learning Research. PMLR, 13–18 Jul 2020, pp. 6682–6691.URL: https://proceedings.mlr. press/v119/malach20a.html

2020
[20]

Explainable Reinforcement Learning: A Survey and Comparative Review

Stephanie Milani et al. “Explainable Reinforcement Learning: A Survey and Comparative Review”. In:ACM Comput. Surv.56.7 (Apr. 9, 2024), 168:1–168:36.ISSN: 0360-0300.DOI: 10.1145/3616864.URL: https://dl.acm. org/doi/10.1145/3616864

work page doi:10.1145/3616864.url: 2024
[21]

Markov Decision Processes with Continuous Side Information

Aditya Modi et al. “Markov Decision Processes with Continuous Side Information”. In:Algorithmic Learn- ing Theory, ALT 2018, 7-9 April 2018, Lanzarote, Canary Islands, Spain. Ed. by Firdaus Janoos, Mehryar Mohri, and Karthik Sridharan. V ol. 83. Proceedings of Machine Learning Research. PMLR, 2018, pp. 597– 618

2018
[22]

HoloOcean: A Full-Featured Ma- rine Robotics Simulator for Perception and Autonomy

Easton Potokar et al. “HoloOcean: A Full-Featured Ma- rine Robotics Simulator for Perception and Autonomy”. In:IEEE Journal of Oceanic Engineering49.4 (Oct. 2024), pp. 1322–1336.ISSN: 1558-1691.DOI: 10.1109/ JOE.2024.3410290.URL: https://ieeexplore.ieee.org/ document/10638434

work page arXiv 2024
[23]

Reverse-Engineering Memory in DreamerV3: From Sparse Representations to Functional Circuits

Jan Sobotka, Auke Ijspeert, and Guillaume Belle- garda. “Reverse-Engineering Memory in DreamerV3: From Sparse Representations to Functional Circuits”. In: Mechanistic Interpretability Workshop at NeurIPS
[24]

30, 2025.URL: https : / / openreview

Sept. 30, 2025.URL: https : / / openreview. net / forum?id=JmjqTi4FDF

2025
[25]

Tristan Trim and Triston Grayston.Mechanistic Inter- pretability of Reinforcement Learning Agents. Oct. 30, 2024.DOI: 10.48550/arXiv.2411.00867. arXiv: 2411. 00867[cs].URL: http://arxiv.org/abs/2411.00867

work page doi:10.48550/arxiv.2411.00867 2024
[26]

A Survey of Multi- Task Deep Reinforcement Learning

Nelson Vithayathil Varghese et al. “A Survey of Multi- Task Deep Reinforcement Learning”. In:Electronics 9.9 (Aug. 22, 2020).ISSN: 2079-9292.DOI: 10.3390/ electronics9091363.URL: https://www.mdpi.com/2079- 9292/9/9/1363

2020

[1] [1]

Discovering Knowledge-Critical Subnetworks in Pretrained Language Models

Deniz Bayazit et al. “Discovering Knowledge-Critical Subnetworks in Pretrained Language Models”. In:Pro- ceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. EMNLP 2024. Ed. by Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen. Miami, Florida, USA: Association for Computational Linguistics, Nov. 2024, pp. 6549–6583.DOI: 10....

2024

[2] [2]

Yoshua Bengio, Nicholas L ´eonard, and Aaron Courville.Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. Aug. 15, 2013.DOI: 10 . 48550 / arXiv . 1308 . 3432. arXiv: 1308 . 3432[cs].URL: http://arxiv.org/abs/1308.3432

work page internal anchor Pith review arXiv 2013

[3] [3]

Mechanistic In- terpretability for AI Safety - A Review

Leonard Bereska and Stratis Gavves. “Mechanistic In- terpretability for AI Safety - A Review”. In:Transac- tions on Machine Learning Research(Apr. 27, 2024). ISSN: 2835-8856.URL: https://openreview.net/forum? id=ePUVetPKu6

2024

[4] [4]

Interpreting Emergent Planning in Model-Free Reinforcement Learning

Thomas Bush et al. “Interpreting Emergent Planning in Model-Free Reinforcement Learning”. In:The Thir- teenth International Conference on Learning Represen- tations. 2025.URL: https://openreview.net/forum?id= DzGe40glxs

2025

[5] [5]

Minigrid and miniworld: Modular and customizable reinforcement learning environments for goal-oriented tasks, 2023

Maxime Chevalier-Boisvert et al. “Minigrid & Mini- world: Modular & Customizable Reinforcement Learn- ing Environments for Goal-Oriented Tasks”. In:CoRR abs/2306.13831 (2023)

work page arXiv 2023

[6] [6]

Learning Phrase Represen- tations Using RNN Encoder–Decoder for Statistical Machine Translation

Kyunghyun Cho et al. “Learning Phrase Represen- tations Using RNN Encoder–Decoder for Statistical Machine Translation”. In:Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). EMNLP 2014. Ed. by Alessan- dro Moschitti, Bo Pang, and Walter Daelemans. Doha, Qatar: Association for Computational Linguistics, Oct. 201...

work page doi:10.3115/v1/d14-1179.url: 2014

[7] [7]

Recent Advances in AI for Nav- igation and Control of Underwater Robots

Leif Christensen et al. “Recent Advances in AI for Nav- igation and Control of Underwater Robots”. In:Current Robotics Reports3.4 (Dec. 1, 2022), pp. 165–175.ISSN: 2662-4087.DOI: 10.1007/s43154-022-00088-3.URL: https://doi.org/10.1007/s43154-022-00088-3

work page doi:10.1007/s43154-022-00088-3.url: 2022

[8] [8]

Towards Automated Circuit Dis- covery for Mechanistic Interpretability

Arthur Conmy et al. “Towards Automated Circuit Dis- covery for Mechanistic Interpretability”. In:Advances in Neural Information Processing Systems. Ed. by A. Oh et al. V ol. 36. Curran Associates, Inc., 2023, pp. 16318– 16352.URL: https://proceedings.neurips.cc/paper files/ paper/2023/file/34e1dbe95d34d7ebaf99b9bcaeb5b2be- Paper-Conference.pdf

2023

[9] [9]

Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks

R ´obert Csord ´as, Sjoerd van Steenkiste, and J ¨urgen Schmidhuber. “Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks”. In:International Conference on Learning Rep- resentations. 2021.URL: https://openreview.net/forum? id=7uVcpu-gMD

2021

[10] [10]

Sharing Knowledge in Multi- Task Deep Reinforcement Learning

Carlo D’Eramo et al. “Sharing Knowledge in Multi- Task Deep Reinforcement Learning”. In: International Conference on Learning Representations. Sept. 23, 2019.URL: https : / / openreview . net / forum ? id = rkgpv2VFvr

2019

[11] [11]

Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural Networks by Pruning A Randomly Weighted Network

James Diffenderfer and Bhavya Kailkhura. “Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural Networks by Pruning A Randomly Weighted Network”. In:International Conference on Learning Representations. 2021.URL: https : / / openreview. net / forum?id=U mat0b9iv

2021

[12] [12]

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

Jonathan Frankle and Michael Carbin. “The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks”. In:International Conference on Learning Representations. 2019.URL: https : / / openreview. net / forum?id=rJl-b3RcF7

2019

[13] [13]

Danijar Hafner et al.Mastering Diverse Domains through World Models. Apr. 17, 2024.DOI: 10.48550/ arXiv.2301.04104. arXiv: 2301.04104[cs].URL: http: //arxiv.org/abs/2301.04104

work page internal anchor Pith review arXiv 2024

[14] [14]

Assaf Hallak, Dotan Di Castro, and Shie Mannor.Con- textual Markov Decision Processes. Tech. rep. arXiv, Feb. 2015.DOI: 10.48550/arXiv.1502.02259. arXiv: 1502.02259

work page Pith review doi:10.48550/arxiv.1502.02259 2015

[15] [15]

Deep Reinforcement Learning with Double Q- Learning

Hado van Hasselt, Arthur Guez, and David Sil- ver. “Deep Reinforcement Learning with Double Q- Learning”. In:Proceedings of the Thirtieth AAAI Con- ference on Artificial Intelligence. AAAI’16. Phoenix, Arizona: AAAI Press, Feb. 2016, pp. 2094–2100

2016

[16] [16]

Categori- cal Reparameterization with Gumbel-Softmax

Eric Jang, Shixiang Gu, and Ben Poole. “Categori- cal Reparameterization with Gumbel-Softmax”. In:In- ternational Conference on Learning Representations. 2017.URL: https : / / openreview . net / forum ? id = rkE3y85ee

2017

[17] [17]

Melvin Laux et al.Contextual Multi-Task Reinforce- ment Learning for Autonomous Reef Monitoring. 2026. arXiv: 2604.12645[cs.RO].URL: https://arxiv.org/ abs/2604.12645

work page internal anchor Pith review Pith/arXiv arXiv 2026

[18] [18]

Break It Down: Evidence for Structural Composition- ality in Neural Networks

Michael A. Lepori, Thomas Serre, and Ellie Pavlick. “Break It Down: Evidence for Structural Composition- ality in Neural Networks”. In:Thirty-seventh Confer- ence on Neural Information Processing Systems. 2023. URL: https://openreview.net/forum?id=rwbzMiuFQl

2023

[19] [19]

Proving the Lottery Ticket Hypoth- esis: Pruning is All You Need

Eran Malach et al. “Proving the Lottery Ticket Hypoth- esis: Pruning is All You Need”. In:Proceedings of the 37th International Conference on Machine Learning. Ed. by Hal Daum ´e III and Aarti Singh. V ol. 119. Pro- ceedings of Machine Learning Research. PMLR, 13–18 Jul 2020, pp. 6682–6691.URL: https://proceedings.mlr. press/v119/malach20a.html

2020

[20] [20]

Explainable Reinforcement Learning: A Survey and Comparative Review

Stephanie Milani et al. “Explainable Reinforcement Learning: A Survey and Comparative Review”. In:ACM Comput. Surv.56.7 (Apr. 9, 2024), 168:1–168:36.ISSN: 0360-0300.DOI: 10.1145/3616864.URL: https://dl.acm. org/doi/10.1145/3616864

work page doi:10.1145/3616864.url: 2024

[21] [21]

Markov Decision Processes with Continuous Side Information

Aditya Modi et al. “Markov Decision Processes with Continuous Side Information”. In:Algorithmic Learn- ing Theory, ALT 2018, 7-9 April 2018, Lanzarote, Canary Islands, Spain. Ed. by Firdaus Janoos, Mehryar Mohri, and Karthik Sridharan. V ol. 83. Proceedings of Machine Learning Research. PMLR, 2018, pp. 597– 618

2018

[22] [22]

HoloOcean: A Full-Featured Ma- rine Robotics Simulator for Perception and Autonomy

Easton Potokar et al. “HoloOcean: A Full-Featured Ma- rine Robotics Simulator for Perception and Autonomy”. In:IEEE Journal of Oceanic Engineering49.4 (Oct. 2024), pp. 1322–1336.ISSN: 1558-1691.DOI: 10.1109/ JOE.2024.3410290.URL: https://ieeexplore.ieee.org/ document/10638434

work page arXiv 2024

[23] [23]

Reverse-Engineering Memory in DreamerV3: From Sparse Representations to Functional Circuits

Jan Sobotka, Auke Ijspeert, and Guillaume Belle- garda. “Reverse-Engineering Memory in DreamerV3: From Sparse Representations to Functional Circuits”. In: Mechanistic Interpretability Workshop at NeurIPS

[24] [24]

30, 2025.URL: https : / / openreview

Sept. 30, 2025.URL: https : / / openreview. net / forum?id=JmjqTi4FDF

2025

[25] [25]

Tristan Trim and Triston Grayston.Mechanistic Inter- pretability of Reinforcement Learning Agents. Oct. 30, 2024.DOI: 10.48550/arXiv.2411.00867. arXiv: 2411. 00867[cs].URL: http://arxiv.org/abs/2411.00867

work page doi:10.48550/arxiv.2411.00867 2024

[26] [26]

A Survey of Multi- Task Deep Reinforcement Learning

Nelson Vithayathil Varghese et al. “A Survey of Multi- Task Deep Reinforcement Learning”. In:Electronics 9.9 (Aug. 22, 2020).ISSN: 2079-9292.DOI: 10.3390/ electronics9091363.URL: https://www.mdpi.com/2079- 9292/9/9/1363

2020