Recognition: no theorem link
Agent-Centric Observation Adaptation for Robust Visual Control under Dynamic Perturbations
Pith reviewed 2026-05-11 00:43 UTC · model grok-4.3
The pith
A plug-and-play adapter with mixture-of-experts restoration and foreground masking recovers 95.3 percent of clean visual control performance under dynamic perturbations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
From an information-bottleneck view, the work establishes that restoration-based representations force encoding of nuisance corruption information, and that instead anchoring to the clean foreground via masks avoids this while preserving task-critical content. The proposed ACO-MoE adapter implements this by combining a routed bank of restoration experts with a foreground-mask branch, pretrained solely on synthetic rendered data with automatic degradation pairs and masks, then deployed at inference on corrupted RGB alone without any labels or references.
What carries the argument
ACO-MoE, an agent-centric observation adapter that routes inputs through a mixture of restoration experts conditioned on a foreground mask branch to produce task-preserving cleaned observations.
Load-bearing premise
That the foreground masks derived from simulation accurately capture the task-relevant information and that the synthetic degradation model sufficiently represents real-world non-stationary corruptions for the adapter to transfer effectively.
What would settle it
Demonstrating that control performance with ACO-MoE falls to or below baseline levels when evaluated on real-world robot footage with actual dynamic perturbations not replicable in the synthetic benchmark.
read the original abstract
Real-world visual systems face time-varying perturbations, including weather, sensor noise, compression artifacts, and background distractions. Existing image restoration methods are typically designed for fixed corruption types and optimized for pixel-level fidelity, leaving open two questions: how restoration behaves under non-stationary corruption switching, and whether pixel-level fidelity preserves the task-relevant information needed by downstream models. To study this setting, we introduce the Visual Degraded Control Suite (VDCS), a benchmark that injects Markov-switching physical degradations into rendered scenes. We further identify a fundamental failure mode of reconstruction-based representations: faithfully reconstructing corrupted observations forces the latent state to encode corruption-specific nuisance information, thereby contaminating downstream models. From an information-bottleneck perspective, anchoring the representation to the clean foreground eliminates this contamination. Motivated by this analysis, we propose \emph{Agent-Centric Observations with Mixture-of-Experts} (ACO-MoE), a frozen, plug-and-play observation adapter that combines a routed bank of restoration experts with a foreground-mask branch. ACO-MoE is pretrained entirely offline on synthetic rendered data with automatically generated degradation pairs and simulation-derived foreground masks, requiring no manual annotation. At inference time, it takes only corrupted RGB as input without corruption labels, clean reference frames, or foreground masks. Across VDCS, DMC-GB, and RoboSuite, ACO-MoE consistently improves downstream control with both model-free and model-based backbones, recovering 95.3\% of clean-input performance under challenging Markov-switching corruptions. It also generalizes zero-shot to unseen visual perturbations excluded from adapter pretraining.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the Visual Degraded Control Suite (VDCS) benchmark for non-stationary visual degradations and proposes ACO-MoE, a frozen plug-and-play observation adapter that combines a routed mixture-of-experts restoration bank with a simulation-derived foreground-mask branch. Pretrained offline on synthetic rendered pairs, ACO-MoE is claimed to eliminate nuisance contamination in latent representations per an information-bottleneck analysis, recovering 95.3% of clean-input performance across VDCS, DMC-GB, and RoboSuite for both model-free and model-based controllers while generalizing zero-shot to unseen perturbations.
Significance. If the results and the foreground-anchoring justification hold, the work offers a practical, label-free adapter for robust visual control under dynamic real-world corruptions, potentially reducing the need for policy retraining or online adaptation in robotics applications.
major comments (3)
- [§3 (Information-Bottleneck Analysis)] §3 (Information-Bottleneck Analysis): The central motivation that anchoring latents to clean foreground masks eliminates nuisance contamination without discarding task-critical information is load-bearing for the performance claims, yet the analysis does not include a quantitative bound or ablation demonstrating that policy-relevant cues (e.g., peripheral dynamics or shadows in RoboSuite manipulation) are retained; if masks remove such context, downstream controllers would lose performance even with restored nuisances.
- [§5 (Experiments)] §5 (Experiments): The reported 95.3% recovery and zero-shot generalization are the primary empirical support, but the results section provides insufficient detail on run counts, error bars, statistical tests, and component ablations (e.g., MoE routing vs. single expert, mask branch vs. full-image input); without these, it is impossible to verify that improvements are not due to the synthetic pretraining distribution or baseline weaknesses.
- [§4.2 (ACO-MoE Architecture)] §4.2 (ACO-MoE Architecture): The transfer assumption that offline synthetic Markov-switching degradations plus simulation masks will handle real non-stationary corruptions at inference is central, but no analysis or cross-domain experiment quantifies the domain gap between rendered degradations and actual sensor/weather effects, risking overstatement of robustness.
minor comments (2)
- [Notation] The notation for the information-bottleneck objective and expert routing could be made more explicit with a single equation block defining all mutual-information terms and gating weights.
- [Figures] Figure captions for mask visualizations should include quantitative metrics (e.g., IoU with clean foreground) to allow readers to assess information preservation directly.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address each major comment point by point below, providing clarifications and committing to revisions that strengthen the manuscript without misrepresenting our contributions.
read point-by-point responses
-
Referee: [§3 (Information-Bottleneck Analysis)] §3 (Information-Bottleneck Analysis): The central motivation that anchoring latents to clean foreground masks eliminates nuisance contamination without discarding task-critical information is load-bearing for the performance claims, yet the analysis does not include a quantitative bound or ablation demonstrating that policy-relevant cues (e.g., peripheral dynamics or shadows in RoboSuite manipulation) are retained; if masks remove such context, downstream controllers would lose performance even with restored nuisances.
Authors: We agree that a more explicit demonstration of retained task-relevant information would strengthen the information-bottleneck argument in Section 3. The current analysis shows that foreground anchoring reduces mutual information with nuisance factors while the empirical recovery of 95.3% clean performance across environments (including RoboSuite) indicates that critical cues such as peripheral dynamics are preserved in practice. To directly address the concern, we will add a targeted ablation in the revised manuscript that isolates the mask branch's effect on control performance in tasks with prominent peripheral elements, quantifying any information loss. revision: yes
-
Referee: [§5 (Experiments)] §5 (Experiments): The reported 95.3% recovery and zero-shot generalization are the primary empirical support, but the results section provides insufficient detail on run counts, error bars, statistical tests, and component ablations (e.g., MoE routing vs. single expert, mask branch vs. full-image input); without these, it is impossible to verify that improvements are not due to the synthetic pretraining distribution or baseline weaknesses.
Authors: We acknowledge that the experimental reporting in Section 5 lacks sufficient statistical rigor and component-level ablations. In the revised manuscript we will report the exact number of independent runs (5 seeds per setting), include error bars on all performance plots, add statistical significance tests comparing ACO-MoE against baselines, and expand the ablation study to explicitly compare full ACO-MoE against a single-expert restoration variant and a mask-free full-image input variant. These additions will allow readers to verify that gains arise from the routed experts and foreground anchoring rather than pretraining artifacts. revision: yes
-
Referee: [§4.2 (ACO-MoE Architecture)] §4.2 (ACO-MoE Architecture): The transfer assumption that offline synthetic Markov-switching degradations plus simulation masks will handle real non-stationary corruptions at inference is central, but no analysis or cross-domain experiment quantifies the domain gap between rendered degradations and actual sensor/weather effects, risking overstatement of robustness.
Authors: The referee correctly notes that our evaluation remains within synthetic domains and does not quantify the synthetic-to-real domain gap. While zero-shot generalization to unseen synthetic perturbations provides evidence of robustness inside the simulated distribution, we do not claim direct equivalence to real sensor or weather effects. In the revised manuscript we will add an explicit limitations paragraph in Section 4.2 and the conclusion discussing this gap and suggesting future real-robot validation protocols. revision: partial
Circularity Check
No circularity in derivation chain; performance claims are empirical
full rationale
The paper's chain begins with an information-bottleneck analysis identifying a failure mode in reconstruction-based latents, then motivates the ACO-MoE architecture (frozen adapter with routed experts and foreground-mask branch) as a plug-and-play solution pretrained offline on synthetic degradation pairs and simulation-derived masks. No equations, derivations, or fitted parameters are presented that reduce the reported 95.3% recovery or zero-shot generalization to inputs by construction. The central claims rest on downstream empirical evaluations across VDCS, DMC-GB, and RoboSuite with model-free and model-based controllers, without self-citations serving as load-bearing uniqueness theorems or ansatzes. The method is self-contained as an empirical architecture whose validity is tested externally on benchmarks rather than tautologically derived from its own definitions or prior author work.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Anchoring the representation to the clean foreground eliminates contamination from corruption-specific nuisance information
Reference graph
Works this paper leans on
-
[1]
Look where you look! Saliency-guided q-networks for generalization in visual reinforcement learning.Advances in neural information processing systems, 35:30693–30706, 2022
David Bertoin, Adil Zouitine, Mehdi Zouitine, and Emmanuel Rachelson. Look where you look! Saliency-guided q-networks for generalization in visual reinforcement learning.Advances in neural information processing systems, 35:30693–30706, 2022
2022
-
[2]
Parameter-free online test-time adaptation
Malik Boudiaf, Romain Mueller, Ismail Ben Ayed, and Luca Bertinetto. Parameter-free online test-time adaptation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8344–8353, 2022
2022
-
[3]
Simple baselines for image restoration
Liangyu Chen, Xiaojie Chu, Xiangyu Zhang, and Jian Sun. Simple baselines for image restoration. In European Conference on Computer Vision (ECCV), volume 13667, pages 17–33, 2022
2022
-
[4]
InstructIR: High-quality image restoration following human instructions
Marcos V Conde, Gregor Geigle, and Radu Timofte. InstructIR: High-quality image restoration following human instructions. InEuropean Conference on Computer Vision, pages 1–21. Springer, 2024
2024
-
[5]
Robustbench: a standardized adversarial robustness benchmark
Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Edoardo Debenedetti, Nicolas Flammarion, Mung Chiang, Prateek Mittal, and Matthias Hein. RobustBench: A standardized adversarial robustness benchmark.arXiv preprint arXiv:2010.09670, 2020
-
[6]
Image denoising by sparse 3-D transform-domain collaborative filtering.IEEE Transactions on Image Processing, 16(8):2080–2095, 2007
Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. Image denoising by sparse 3-D transform-domain collaborative filtering.IEEE Transactions on Image Processing, 16(8):2080–2095, 2007
2080
-
[7]
Bryan LM de Oliveira, Luana GB Martins, Bruno Brandão, Murilo L da Luz, Telma W de L Soares, and Luckeciano C Melo. Sliding puzzles gym: A scalable benchmark for state representation in visual reinforcement learning.arXiv preprint arXiv:2410.14038, 2024
-
[8]
MambaIR: A simple baseline for image restoration with state-space model
Hang Guo, Jinmin Li, Tao Dai, Zhihao Ouyang, Xudong Ren, and Shu-Tao Xia. MambaIR: A simple baseline for image restoration with state-space model. InEuropean Conference on Computer Vision (ECCV), pages 222–241. Springer, 2024
2024
-
[9]
Onerestore: A universal restoration framework for composite degradation
Yu Guo, Yuan Gao, Yuxu Lu, Huilin Zhu, Ryan Wen Liu, and Shengfeng He. Onerestore: A universal restoration framework for composite degradation. InEuropean Conference on Computer Vision, pages 255–272. Springer, 2024
2024
-
[10]
Yu Guo, Shengfeng He, Yuxu Lu, Haonan An, Yihang Tao, Huilin Zhu, Jingxian Liu, and Yuguang Fang. Neptune-x: Active x-to-maritime generation for universal maritime object detection.arXiv preprint arXiv:2509.20745, 2025. [11]David Ha and Jürgen Schmidhuber. World models.arXiv preprint arXiv:1803.10122, 2018
-
[11]
Learning latent dynamics for planning from pixels
Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. InInternational Conference on Machine Learning, volume 97, pages 2555–2565. PMLR, 2019
2019
-
[12]
Mastering Diverse Domains through World Models
Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models.arXiv preprint arXiv:2301.04104, 2023
work page internal anchor Pith review arXiv 2023
-
[13]
Dsp-reg: Domain-sensitive parameter regularization for robust domain generalization, 2026
Xudong Han, Senkang Hu, Yihang Tao, Yu Guo, Philip Birch, Sam Tak Wu Kwong, and Yuguang Fang. Dsp-reg: Domain-sensitive parameter regularization for robust domain generalization, 2026
2026
-
[14]
Generalizationinreinforcementlearningbysoftdataaugmentation
NicklasHansenandXiaolongWang. Generalizationinreinforcementlearningbysoftdataaugmentation. In2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13611–13617, 2021. 13/37 Agent-Centric Visual Reinforcement Learning under Dynamic Perturbations
2021
-
[15]
Self-supervised policy adaptation during deployment
Nicklas Hansen, Rishabh Jangir, Yu Sun, Guillem Alenyà, Pieter Abbeel, Alexei A Efros, Lerrel Pinto, and Xiaolong Wang. Self-supervised policy adaptation during deployment. InInternational Conference on Learning Representations, 2021
2021
-
[16]
Stabilizing deep q-learning with convnets and vision transformersunderdataaugmentation
Nicklas Hansen, Hao Su, and Xiaolong Wang. Stabilizing deep q-learning with convnets and vision transformersunderdataaugmentation. InAdvancesinNeuralInformationProcessingSystems,volume34, pages 3680–3693, 2021
2021
-
[17]
TD-MPC2: Scalable, Robust World Models for Continuous Control
Nicklas Hansen, Hao Su, and Xiaolong Wang. TD-MPC2: Scalable, robust world models for continuous control.arXiv preprint arXiv:2310.16828, 2023
work page internal anchor Pith review arXiv 2023
-
[18]
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
Dan Hendrycks and Thomas G. Dietterich. Benchmarking neural network robustness to common corruptions and perturbations.arXiv preprint arXiv:1903.12261, 2019
work page internal anchor Pith review arXiv 1903
-
[19]
Agentscodriver: Large language model empowered collaborative driving with lifelong learning, 2024
Senkang Hu, Zhengru Fang, Zihan Fang, Yiqin Deng, Xianhao Chen, and Yuguang Fang. Agentscodriver: Large language model empowered collaborative driving with lifelong learning, 2024
2024
-
[20]
Senkang Hu, Zhengru Fang, Yiqin Deng, Xianhao Chen, Yuguang Fang, and Sam Kwong. Toward full- scene domain generalization in multi-agent collaborative bird’s eye view segmentation for connected and autonomous driving.IEEE Transactions on Intelligent Transportation Systems, 26(2):1783–1796, 2025
2025
-
[21]
Agentscomerge: Large language model empowered collaborative decision making for ramp merging.IEEE Transactions on Mobile Computing, 24(10):9791–9805, 2025
Senkang Hu, Zhengru Fang, Zihan Fang, Yiqin Deng, Xianhao Chen, Yuguang Fang, and Sam Tak Wu Kwong. Agentscomerge: Large language model empowered collaborative decision making for ramp merging.IEEE Transactions on Mobile Computing, 24(10):9791–9805, 2025
2025
-
[22]
Senkang Hu, Yong Dai, Yuzhi Zhao, Yihang Tao, Yu Guo, Zhengru Fang, Sam Tak Wu Kwong, and Yuguang Fang. Optimizing agentic reasoning with retrieval via synthetic semantic information gain reward.arXiv preprint arXiv:2602.00845, 2026
-
[23]
Planning- oriented autonomous driving
Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, Lewei Lu, Xiaosong Jia, Qiang Liu, Jifeng Dai, Yu Qiao, and Hongyang Li. Planning- oriented autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17853–17862, 2023
2023
-
[24]
Spectrum random masking for generalization in image-based reinforcement learning
Yangru Huang, Peixi Peng, Yifan Zhao, Guangyao Chen, and Yonghong Tian. Spectrum random masking for generalization in image-based reinforcement learning. InAdvances in Neural Information Processing Systems, volume 35, pages 20393–20406, 2022
2022
-
[25]
Adaptive mixtures of local experts.Neural Computation, 3(1):79–87, 1991
Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton. Adaptive mixtures of local experts.Neural Computation, 3(1):79–87, 1991
1991
-
[26]
Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lelio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[27]
Hierarchical mixtures of experts and the EM algorithm.Neural Computation, 6(2):181–214, 1994
Michael I Jordan and Robert A Jacobs. Hierarchical mixtures of experts and the EM algorithm.Neural Computation, 6(2):181–214, 1994. 14/37 Agent-Centric Visual Reinforcement Learning under Dynamic Perturbations
1994
-
[28]
3D common corruptions and data augmentation
Oğuzhan Fatih Kar, Teresa Yeo, Andrei Atanov, and Amir Zamir. 3D common corruptions and data augmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18963–18974, 2022
2022
-
[29]
Kyungmin Kim, JB Lanier, Pierre Baldi, Charless Fowlkes, and Roy Fox. Make the pertinent salient: Task-relevant reconstruction for visual control with distractions.arXiv preprint arXiv:2410.09972, 2024
-
[30]
Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning From Pixels
Ilya Kostrikov, Denis Yarats, and Rob Fergus. Image augmentation is all you need: Regularizing deep reinforcement learning from pixels.arXiv preprint arXiv:2004.13649, 2020
-
[31]
DeblurGAN: Blind motion deblurring using conditional adversarial networks
Orest Kupyn, Volodymyr Budzan, Mykola Mykhailych, Dmytro Mishkin, and Jiří Matas. DeblurGAN: Blind motion deblurring using conditional adversarial networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 8183–8192, 2018
2018
-
[32]
Reinforce- ment learning with augmented data
Michael Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, and Aravind Srinivas. Reinforce- ment learning with augmented data. InAdvances in Neural Information Processing Systems, volume 33, pages 19884–19895, 2020
2020
-
[33]
CURL: Contrastive unsupervised representations for reinforcement learning
Michael Laskin, Aravind Srinivas, and Pieter Abbeel. CURL: Contrastive unsupervised representations for reinforcement learning. InInternational Conference on Machine Learning, volume 119, pages 5639–5650. PMLR, 2020
2020
-
[34]
All-in-one image restoration for unknown corruption
Boyun Li, Xiao Liu, Peng Hu, Zhongqin Wu, Jiancheng Lv, and Xi Peng. All-in-one image restoration for unknown corruption. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17452–17462, 2022
2022
-
[35]
Instruct2see: Learningtoremoveanyobstructions across distributions
JunhangLi,YuGuo,ChuhuaXian,andShengfengHe. Instruct2see: Learningtoremoveanyobstructions across distributions. InInternational Conference on Machine Learning, pages 34453–34470. PMLR, 2025
2025
-
[36]
Policy-independent behavioral metric-based representa- tion for deep reinforcement learning
Weijian Liao, Zongzhang Zhang, and Yang Yu. Policy-independent behavioral metric-based representa- tion for deep reinforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 8746–8754, 2023
2023
-
[37]
MoE-LLaVA: Mixture of experts for large vision-language models.IEEE Transactions on Multimedia, 2026
Bin Lin, Zhenyu Tang, Yang Ye, Jinfa Huang, Junwu Zhang, Yatian Pang, Peng Jin, Munan Ning, Jiebo Luo, and Li Yuan. MoE-LLaVA: Mixture of experts for large vision-language models.IEEE Transactions on Multimedia, 2026
2026
-
[38]
TTT++: When does self-supervised test-time training fail or thrive? InAdvances in Neural Information Processing Systems (NeurIPS), volume 34, pages 21808–21820, 2021
Yuejiang Liu, Parth Kothari, Bastien van Delft, Baptiste Bellot-Gurlet, Taylor Mordan, and Alexandre Alahi. TTT++: When does self-supervised test-time training fail or thrive? InAdvances in Neural Information Processing Systems (NeurIPS), volume 34, pages 21808–21820, 2021
2021
-
[39]
Gustafsson, Zheng Zhao, Jens Sjölund, and Thomas B
Ziwei Luo, Fredrik K. Gustafsson, Zheng Zhao, Jens Sjölund, and Thomas B. Schön. Controlling vision-language models for multi-task image restoration. InInternational Conference on Learning Representations (ICLR), 2024
2024
-
[40]
Transformers are sample efficient world models.arXiv preprint arXiv:2209.00588, 2022
Vincent Micheli, Eloi Alonso, and Francois Fleuret. Transformers are sample-efficient world models. arXiv preprint arXiv:2209.00588, 2022
-
[41]
Human-level control through deep reinforcement learning.Nature, 518(7540):529–533, 2015
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcement lear...
2015
-
[42]
Mitsuhiko Nakamoto, Oier Mees, Aviral Kumar, and SergeyLevine. Steering your generalists: Improving robotic foundation models via value guidance.arXiv preprint arXiv:2410.13816, 2024
-
[43]
Dmc-vb: A benchmark for representation learning for control with visual distractors.Advances in Neural Information Processing Systems, 37:6574–6602, 2024
Joseph Ortiz, Antoine Dedieu, Wolfgang Lehrach, J Swaroop Guntupalli, Carter Wendelken, Ahmad Humayun, Sivaramakrishnan Swaminathan, Guangyao Zhou, Miguel Lázaro-Gredilla, and Kevin P Murphy. Dmc-vb: A benchmark for representation learning for control with visual distractors.Advances in Neural Information Processing Systems, 37:6574–6602, 2024
2024
-
[44]
Model-based rein- forcement learning with isolated imaginations.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(5):2788–2803, 2024
Minting Pan, Xiangming Zhu, Yitao Zheng, Yunbo Wang, and Xiaokang Yang. Model-based rein- forcement learning with isolated imaginations.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(5):2788–2803, 2024
2024
-
[45]
PromptIR: Prompting for all-in-one blind image restoration
Vaishnav Potlapalli, Syed Waqas Zamir, Salman Khan, and Fahad Shahbaz Khan. PromptIR: Prompting for all-in-one blind image restoration. InAdvances in Neural Information Processing Systems (NeurIPS), volume 36, 2023
2023
-
[46]
From sparse to soft mixtures of experts.arXiv preprint arXiv:2308.00951, 2023
Joan Puigcerver, Carlos Riquelme, Basil Mustafa, and Neil Houlsby. From sparse to soft mixtures of experts.arXiv preprint arXiv:2308.00951, 2023
-
[47]
arXiv preprint arXiv:2402.08191
Wilbert Pumacay, Ishika Singh, Jiafei Duan, Ranjay Krishna, Jesse Thomason, and Dieter Fox. The colosseum: A benchmark for evaluating generalization for robotic manipulation.arXiv preprint arXiv:2402.08191, 2024
-
[48]
MoE-DiffIR: Task-customized diffusion priors for universal compressed image restoration
Yulin Ren, Xin Li, Bingchen Li, Xingrui Wang, Mengxi Guo, Shijie Zhao, Li Zhang, and Zhibo Chen. MoE-DiffIR: Task-customized diffusion priors for universal compressed image restoration. InEuropean Conference on Computer Vision (ECCV), volume 15067, pages 116–134, 2024
2024
-
[49]
Scaling vision with sparse mixture of experts
Carlos Riquelme, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, André Susano Pinto, Daniel Keysers, and Neil Houlsby. Scaling vision with sparse mixture of experts. InAdvances in Neural Information Processing Systems, volume 34, pages 8583–8595, 2021
2021
-
[50]
Jan Robine, Marc Hoftmann, Tobias Uelwer, and Stefan Harmeling. Transformer-based world models are happy with 100k interactions.arXiv preprint arXiv:2303.07109, 2023
-
[51]
U-Net: Convolutional networks for biomedical image segmentation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional networks for biomedical image segmentation. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, volume 9351, pages 234–241. Springer, 2015
2015
-
[52]
Outrageously large neural networks: The sparsely-gated mixture-of-experts layer
Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In International Conference on Learning Representations, 2017
2017
-
[53]
DriveX: Omni scene modeling for learning generalizable world knowledge in autonomous driving
Chen Shi, Shaoshuai Shi, Kehua Sheng, Bo Zhang, and Li Jiang. DriveX: Omni scene modeling for learning generalizable world knowledge in autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 28599–28609, 2025
2025
-
[54]
A simple framework for generalization in visual RL under dynamic scene perturbations
Wonil Song, Hyesong Choi, Kwanghoon Sohn, and Dongbo Min. A simple framework for generalization in visual RL under dynamic scene perturbations. volume 37, pages 121790–121826, 2024
2024
-
[55]
Austin Stone, Oscar Ramirez, Kurt Konolige, and Rico Jonschkowski. The distracting control suite: a challenging benchmark for reinforcement learning from pixels.arXiv preprint arXiv:2101.02722, 2021
-
[56]
Ruixiang Sun, Hongyu Zang, Xin Li, and Riashat Islam. Learning latent dynamic robust representations for world models.arXiv preprint arXiv:2405.06263, 2024. 16/37 Agent-Centric Visual Reinforcement Learning under Dynamic Perturbations
-
[57]
Proagentbench: Evaluating llm agents for proactive assistance with real-world data
Yuanbo Tang, Huaze Tang, Tingyu Cao, Lam Nguyen, Anping Zhang, Xinwen Cao, Chunkang Liu, Wenbo Ding, and Yang Li. ProAgentBench: Evaluating llm agents for proactive assistance with real-world data.arXiv preprint arXiv:2602.04482, 2026
-
[58]
Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, et al. Deepmind control suite.arXiv preprint arXiv:1801.00690, 2018
work page internal anchor Pith review arXiv 2018
-
[59]
Focus-Then-Reuse: Fast adaptation in visual perturbation environments
Jiahui Wang, Chao Chen, Jiacheng Xu, Zongzhang Zhang, and Yang Yu. Focus-Then-Reuse: Fast adaptation in visual perturbation environments. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
2025
-
[60]
GridFormer: Residual dense transformer with grid structure for image restoration in adverse weather conditions.International journal of computer vision, 132(10):4541–4563, 2024
Tao Wang, Kaihao Zhang, Ziqian Shao, Wenhan Luo, Bjorn Stenger, Tong Lu, Tae-Kyun Kim, Wei Liu, and Hongdong Li. GridFormer: Residual dense transformer with grid structure for image restoration in adverse weather conditions.International journal of computer vision, 132(10):4541–4563, 2024
2024
-
[61]
DriveDreamer: Towards real-world-driven world models for autonomous driving
Xiaofeng Wang, Zheng Zhu, Guan Huang, Xinze Chen, Jiagang Zhu, and Jiwen Lu. DriveDreamer: Towards real-world-driven world models for autonomous driving. InEuropean Conference on Computer Vision (ECCV), pages 55–72. Springer, 2024
2024
-
[62]
Ziyu Wang, Yanjie Ze, Yifei Sun, Zhecheng Yuan, and Huazhe Xu. Generalizable visual reinforcement learning with segment anything model.arXiv preprint arXiv:2312.17116, 2023
-
[63]
DiffIR: Efficient diffusion model for image restoration
Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, and Luc Van Gool. DiffIR: Efficient diffusion model for image restoration. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 13095–13105, 2023
2023
-
[64]
Image de-raining transformer.IEEE transactions on pattern analysis and machine intelligence, 45(11):12978–12995, 2022
Jie Xiao, Xueyang Fu, Aiping Liu, Feng Wu, and Zheng-Jun Zha. Image de-raining transformer.IEEE transactions on pattern analysis and machine intelligence, 45(11):12978–12995, 2022
2022
-
[65]
GuoweiXu, RuijieZheng, YongyuanLiang, XiyaoWang, ZhechengYuan, TianyingJi, YuLuo, XiaoyuLiu, JiaxinYuan,PuHua,ShuzhenLi,YanjieZe,HalDaume,FurongHuang,andHuazheXu. DrM:Mastering visual reinforcement learning through dormant ratio minimization.arXiv preprint arXiv:2310.19668, 2023
-
[66]
arXiv preprint arXiv:2310.061141(2), 6 (2023)
Mengjiao Yang, Yilun Du, Kamyar Ghasemipour, Jonathan Tompson, Dale Schuurmans, and Pieter Abbeel. Learning interactive real-world simulators.arXiv preprint arXiv:2310.06114, 2023
-
[67]
arXiv preprint arXiv:2107.09645 , year=
Denis Yarats, Rob Fergus, Alessandro Lazaric, and Lerrel Pinto. Mastering visual continuous control: Improved data-augmented reinforcement learning.arXiv preprint arXiv:2107.09645, 2021
-
[68]
Xinlei Yu, Zhangquan Chen, Yongbo He, Tianyu Fu, Cheng Yang, Chengming Xu, Yue Ma, Xiaobin Hu, Zhe Cao, Jie Xu, et al. The latent space: Foundation, evolution, mechanism, ability, and outlook.arXiv preprint arXiv:2604.02029, 2026
-
[69]
Pre- trained image encoder for generalizable visual reinforcement learning.Advances in Neural Information Processing Systems, 35:13022–13037, 2022
Zhecheng Yuan, Zhengrong Xue, Bo Yuan, Xueqian Wang, Yi Wu, Yang Gao, and Huazhe Xu. Pre- trained image encoder for generalizable visual reinforcement learning.Advances in Neural Information Processing Systems, 35:13022–13037, 2022
2022
-
[70]
Rl-vigen: Areinforcement learning benchmark for visual generalization.Advances in Neural Information Processing Systems, 36: 6720–6747, 2023
ZhechengYuan, SizheYang, PuHua, CanChang, KaizheHu, andHuazheXu. Rl-vigen: Areinforcement learning benchmark for visual generalization.Advances in Neural Information Processing Systems, 36: 6720–6747, 2023. 17/37 Agent-Centric Visual Reinforcement Learning under Dynamic Perturbations
2023
-
[71]
Restormer: Efficient transformer for high-resolution image restoration
Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming- Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5728–5739, 2022
2022
-
[72]
Learning invariant representations for reinforcement learning without reconstruction
Amy Zhang, Rowan McAllister, Roberto Calandra, Yarin Gal, and Sergey Levine. Learning invariant representations for reinforcement learning without reconstruction. InInternational Conference on Learning Representations, 2021
2021
-
[73]
Focus On What Matters: Separated models for visual-based rl generalization.Advances in Neural Information Processing Systems, 37:116960–116986, 2024
Di Zhang, Bowen Lv, Hai Zhang, Feifan Yang, Junqiao Zhao, Hang Yu, Chang Huang, Hongtu Zhou, Chen Ye, et al. Focus On What Matters: Separated models for visual-based rl generalization.Advances in Neural Information Processing Systems, 37:116960–116986, 2024
2024
-
[74]
Image de-raining using a conditional generative adversarial network.IEEE transactions on circuits and systems for video technology, 30(11):3943–3956, 2019
He Zhang, Vishwanath Sindagi, and Vishal M Patel. Image de-raining using a conditional generative adversarial network.IEEE transactions on circuits and systems for video technology, 30(11):3943–3956, 2019
2019
-
[75]
STORM: Efficient stochastic transformer based world models for reinforcement learning.Advances in Neural Information Processing Systems, 36:27147–27166, 2023
Weipu Zhang, Gang Wang, Jian Sun, Yetian Yuan, and Gao Huang. STORM: Efficient stochastic transformer based world models for reinforcement learning.Advances in Neural Information Processing Systems, 36:27147–27166, 2023
2023
-
[76]
Perceive-IR: Learning to perceive degradation better for all-in-one image restoration.IEEE Transactions on Image Processing, 2025
Xu Zhang, Jiaqi Ma, Guoli Wang, Qian Zhang, Huan Zhang, and Lefei Zhang. Perceive-IR: Learning to perceive degradation better for all-in-one image restoration.IEEE Transactions on Image Processing, 2025
2025
-
[77]
SAC Flow: Sample-efficient reinforcement learning of flow-based policies via velocity-reparameterized sequential modeling
Yixian Zhang, Shu’ang Yu, Tonghe Zhang, Mo Guang, Haojia Hui, Kaiwen Long, Yu Wang, Chao Yu, and Wenbo Ding. SAC Flow: Sample-efficient reinforcement learning of flow-based policies via velocity-reparameterized sequential modeling. InInternational Conference on Learning Representations (ICLR), 2026
2026
-
[78]
TACO: Temporal latent action-driven contrastive loss for visual reinforcement learning
Ruijie Zheng, Xiyao Wang, Yanchao Sun, Shuang Ma, Jieyu Zhao, Huazhe Xu, Hal Daumé III, and Furong Huang. TACO: Temporal latent action-driven contrastive loss for visual reinforcement learning. Advances in Neural Information Processing Systems, 36:48203–48225, 2023
2023
-
[79]
OccWorld: Learning a 3D occupancy world model for autonomous driving
Wenzhao Zheng, Weiliang Chen, Yuanhui Huang, Borui Zhang, Yueqi Duan, and Jiwen Lu. OccWorld: Learning a 3D occupancy world model for autonomous driving. InEuropean Conference on Computer Vision (ECCV), pages 55–72. Springer, 2024
2024
-
[80]
Gaoyue Zhou, Haizhou Pan, Yann LeCun, and Lerrel Pinto. DINO-WM: World models on pre-trained visual features enable zero-shot planning.arXiv preprint arXiv:2411.04983, 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.