Recognition: 2 theorem links
· Lean TheoremASH: Agents that Self-Hone via Embodied Learning
Pith reviewed 2026-05-15 02:49 UTC · model grok-4.3
The pith
ASH learns long-horizon policies in complex games by training an inverse dynamics model on its own trajectories to label unlabeled internet videos.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ASH reaches an average of 11.2 out of 12 milestones in Pokemon Emerald and 9.9 out of 12 in The Legend of Zelda by repeatedly training an inverse dynamics model on its own noisy trajectories and using the model to derive supervision signals from unlabeled internet video, while also storing unsupervised key moments as memory; the strongest baselines remain stuck at roughly 6 milestones in both environments.
What carries the argument
The self-improvement loop that trains an inverse dynamics model from the agent's own trajectories to label actions in internet video, paired with unsupervised extraction of key moments for long-term memory.
If this is right
- The same self-honing loop can be applied to other long-horizon embodied tasks that lack dense rewards or expert data.
- Agents can bootstrap policies from web-scale unlabeled video once they generate enough of their own trajectories to train a usable IDM.
- Unsupervised key-moment retention enables planning over multi-hour horizons without explicit state tracking.
- Performance gaps versus baselines widen as task length increases because self-generated labels keep the policy advancing.
Where Pith is reading between the lines
- If the IDM generalizes across visual domains, the method could transfer from game video to real-world robot footage without additional annotation.
- Scaling the volume of internet video or the number of self-improvement cycles could further raise the fraction of milestones reached.
- The approach suggests that internet video plus self-generated data forms a sufficient training signal for many sequential decision problems once an initial exploration policy exists.
Load-bearing premise
An inverse dynamics model trained only on the agent's own noisy self-generated trajectories will produce sufficiently accurate action labels when applied to unrelated low-quality internet video clips.
What would settle it
Training the IDM on ASH trajectories and then measuring whether policy performance stops improving after one or more cycles of video-derived supervision, or whether milestone counts remain comparable to the strongest baseline.
Figures
read the original abstract
Long-horizon embodied tasks remain a fundamental challenge in AI, as current methods rely on hand-engineered rewards or action-labeled demonstrations, neither of which scales. We introduce ASH, an agentic system that learns an embodied policy from unlabeled, noisy internet video, without reward shaping or expert annotation. ASH follows a self-improvement loop; when it gets stuck, ASH learns an Inverse Dynamics Model (IDM) from its own trajectories, and uses its IDM to extract supervision from relevant internet video. ASH uses unsupervised learning to identify key moments from large-scale internet video and retains them as long-term memory -- allowing it to tackle long-horizon problems. We evaluate ASH on two complementary environments demanding multi-hour planning: Pokemon Emerald, a turn-based RPG, and The Legend of Zelda: The Minish Cap, a real-time action-adventure game. In both games, behavioral cloning, retrieval-augmented and zero-shot foundation-model baselines plateau, while ASH sustains progression across our 8-hour evaluation. ASH reaches an average of $11.2/12$ milestones in Pokemon Emerald and $9.9/12$ in Legend of Zelda, while the strongest baseline gets stuck in both environments at an average of $6.5/12$ and $6.0/12$ milestones, respectively. We demonstrate that self-improving agents are a scalable recipe for long-horizon embodied learning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ASH, an agentic system for long-horizon embodied learning that follows a self-improvement loop: when stuck, it trains an Inverse Dynamics Model (IDM) on its own trajectories and applies the IDM to extract action labels from unlabeled noisy internet video for supervision, while using unsupervised learning to identify key moments as long-term memory. Evaluated on Pokemon Emerald and Legend of Zelda, ASH achieves average milestone progress of 11.2/12 and 9.9/12 respectively, while baselines plateau at 6.5/12 and 6.0/12.
Significance. If the performance gains can be shown to stem from the IDM-based self-honing mechanism with proper validation, the work would represent a meaningful step toward scalable embodied agents that leverage abundant internet video without hand-engineered rewards or expert annotations, addressing a core limitation in current long-horizon task learning.
major comments (2)
- [Abstract] Abstract: The central performance claims (11.2/12 milestones in Pokemon Emerald, 9.9/12 in Zelda) are reported without error bars, ablation studies isolating the IDM supervision component, or details on filtering noisy video, making it impossible to determine whether the self-honing loop drives the gains over baselines.
- [Abstract] Abstract: The method's validity hinges on the IDM, trained only on the agent's initially random or stuck self-trajectories, producing accurate action labels on unrelated noisy internet video despite domain shifts in quality, frame rate, perspective, and style; however, no quantitative IDM accuracy metrics on held-out external clips are provided.
minor comments (1)
- The abstract would benefit from a concise definition of the 12 milestones and how they are evaluated across the 8-hour runs to improve clarity and reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and commit to revisions that strengthen the validation of ASH's self-honing mechanism.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central performance claims (11.2/12 milestones in Pokemon Emerald, 9.9/12 in Zelda) are reported without error bars, ablation studies isolating the IDM supervision component, or details on filtering noisy video, making it impossible to determine whether the self-honing loop drives the gains over baselines.
Authors: We agree that error bars, targeted ablations, and filtering details are essential to substantiate the claims. In the revised manuscript we will report error bars over multiple independent runs for all milestone-progress metrics. We will add ablation studies that isolate the IDM-based internet-video supervision (comparing full ASH against variants without the IDM loop or without video labels) and will expand the methods section with the precise filtering criteria and preprocessing steps applied to noisy internet clips. These additions will directly demonstrate that the self-honing loop accounts for the observed gains over baselines. revision: yes
-
Referee: [Abstract] Abstract: The method's validity hinges on the IDM, trained only on the agent's initially random or stuck self-trajectories, producing accurate action labels on unrelated noisy internet video despite domain shifts in quality, frame rate, perspective, and style; however, no quantitative IDM accuracy metrics on held-out external clips are provided.
Authors: We acknowledge the importance of quantifying IDM generalization. The revised manuscript will include new quantitative results measuring IDM action-prediction accuracy on held-out external video clips drawn from the same internet sources, explicitly reporting performance under the domain shifts in quality, frame rate, perspective, and visual style. These metrics will be presented alongside the end-to-end results to confirm that the IDM trained on agent trajectories can reliably label noisy video for supervision. revision: yes
Circularity Check
No significant circularity in ASH's procedural self-improvement loop
full rationale
The paper presents ASH as an agentic system following a self-improvement loop: learning an IDM from its own trajectories to extract supervision from internet video, combined with unsupervised key moment identification. This is described as a procedural algorithm without any mathematical derivations, equations, or fitted parameters that reduce predictions to inputs by construction. Performance is evaluated empirically via milestone completion in games, not through self-referential claims. No self-citation load-bearing arguments or uniqueness theorems are referenced. The central claim rests on the empirical results rather than tautological definitions, making the derivation self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Internet videos contain extractable supervision for embodied actions when paired with an IDM trained on the agent's own trajectories
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
ASH follows a self-improvement loop; when it gets stuck, ASH learns an Inverse Dynamics Model (IDM) from its own trajectories, and uses its IDM to extract supervision from relevant internet video.
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We use HDBSCAN clustering to discover recurring key moments... long-term memory of observations ρ that are the wl most recent key moments.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Wei-Chieh Huang, Weizhi Zhang, Yueqing Liang, Yuanchen Bei, Yankai Chen, Tao Feng, Xinyu Pan, Zhen Tan, Yu Wang, Tianxin Wei, Shanglin Wu, Ruiyao Xu, Liangwei Yang, Rui Yang, Wooseong Yang, Chin-Yuan Yeh, Hanrong Zhang, Haozhen Zhang, Siqi Zhu, Henry Peng Zou, Wanjia Zhao, Song Wang, Wujiang Xu, Zixuan Ke, Zheng Hui, Dawei Li, Yaozu Wu, Langzhou He, Chen ...
-
[2]
Yafei Hu, Quanting Xie, Vidhi Jain, Jonathan Francis, Jay Patrikar, Nikhil Keetha, Seungchan Kim, Yaqi Xie, Tianyi Zhang, Hao-Shu Fang, Shibo Zhao, Shayegan Omidshafiei, Dong-Ki Kim, Ali akbar Agha-mohammadi, Katia Sycara, Matthew Johnson-Roberson, Dhruv Batra, Xiaolong Wang, Sebastian Scherer, Chen Wang, Zsolt Kira, Fei Xia, and Yonatan Bisk. Toward gene...
-
[3]
Behavioral cloning from observation
Faraz Torabi, Garrett Warnell, and Peter Stone. Behavioral cloning from observation. InProceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pages 4950–4957. International Joint Conferences on Artificial Intelligence Organization, 7 2018. doi: 10.24963/ijcai.2018/
-
[4]
URLhttps://doi.org/10.24963/ijcai.2018/687
- [5]
-
[6]
Efficient reductions for imitation learning
Stephane Ross and Drew Bagnell. Efficient reductions for imitation learning. In Yee Whye Teh and Mike Titterington, editors,Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, volume 9 ofProceedings of Machine Learning Research, pages 661–668, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010. PMLR. URL http...
work page 2010
- [7]
-
[8]
Pokemon red via reinforcement learning, 2025
Marco Pleines, Daniel Addis, David Rubinstein, Frank Zimmer, Mike Preuss, and Peter Whidden. Pokemon red via reinforcement learning, 2025
work page 2025
-
[9]
Training Agents Inside of Scalable World Models
Danijar Hafner, Wilson Yan, and Timothy Lillicrap. Training agents inside of scalable world models, 2025. URLhttps://arxiv.org/abs/2509.24527
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[10]
Yichen Zhu, Zhicai Ou, Xiaofeng Mou, and Jian Tang. Retrieval-Augmented Embodied Agents . In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17985–17995, Los Alamitos, CA, USA, June 2024. IEEE Computer Society. doi: 10.1109/CVPR52733.2024.01703. URL https://doi.ieeecomputersociety.org/10.1109/CVPR52733.2024.01703
-
[11]
Qwen3.5: Accelerating productivity with native multimodal agents, February 2026
Qwen Team. Qwen3.5: Accelerating productivity with native multimodal agents, February 2026. URL https://qwen.ai/blog?id=qwen3.5
work page 2026
-
[12]
Optimus-2: Multi- modal minecraft agent with goal-observation-action conditioned policy
Zaijing Li, Yuquan Xie, Rui Shao, Gongwei Chen, Dongmei Jiang, and Liqiang Nie. Optimus-2: Multi- modal minecraft agent with goal-observation-action conditioned policy. In2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2025
work page 2025
-
[13]
Behavioral Cloning from Observation
Faraz Torabi, Garrett Warnell, and Peter Stone. Behavioral cloning from observation, 2018. URL https://arxiv.org/abs/1805.01954
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[14]
Imitating latent policies from observation
Ashley Edwards, Himanshu Sahni, Yannick Schroecker, and Charles Isbell. Imitating latent policies from observation. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors,Proceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Re- search, pages 1755–1763. PMLR, 09–15 Jun 2019. URL https://proceedi...
work page 2019
-
[15]
Learning to act without actions
Dominik Schmidt and Minqi Jiang. Learning to act without actions. InThe Twelfth International Conference on Learning Representations (ICLR), 2024. 10
work page 2024
-
[16]
Jake Bruce, Michael D Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Maria Elisabeth Bechtle, Feryal Behbahani, Stephanie C.Y . Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nando...
work page 2024
-
[17]
Retrieval- augmented generation for knowledge-intensive nlp tasks
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval- augmented generation for knowledge-intensive nlp tasks. InProceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’2...
work page 2020
-
[18]
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models.arXiv preprint arXiv:2210.03629, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[19]
Reflexion: language agents with verbal reinforcement learning
Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik R Narasimhan, and Shunyu Yao. Reflexion: language agents with verbal reinforcement learning. InThirty-seventh Conference on Neural Information Processing Systems, 2023. URLhttps://openreview.net/forum?id=vAElhFcKW6
work page 2023
-
[20]
Large-scale retrieval for reinforcement learning
Peter Conway Humphreys, Arthur Guez, Olivier Tieleman, Laurent Sifre, Theophane Weber, and Timothy P Lillicrap. Large-scale retrieval for reinforcement learning. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=Ya9lATuQ3gg
work page 2022
-
[21]
Retrieval-augmented reinforce- ment learning
Anirudh Goyal, Abram Friesen, Andrea Banino, Theophane Weber, Nan Rosemary Ke, Adrià Puig- domènech Badia, Arthur Guez, Mehdi Mirza, Peter C Humphreys, Ksenia Konyushova, Michal Valko, Simon Osindero, Timothy Lillicrap, Nicolas Heess, and Charles Blundell. Retrieval-augmented reinforce- ment learning. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csab...
work page 2022
-
[22]
Large language models are semi-parametric reinforcement learning agents
Danyang Zhang, Lu Chen, Situo Zhang, Hongshen Xu, Zihan Zhao, and Kai Yu. Large language models are semi-parametric reinforcement learning agents. InThirty-seventh Conference on Neural Information Processing Systems, 2023. URLhttps://openreview.net/forum?id=ZcJa1R6j3v
work page 2023
-
[23]
STar: Bootstrapping reasoning with reasoning
Eric Zelikman, Yuhuai Wu, Jesse Mu, and Noah Goodman. STar: Bootstrapping reasoning with reasoning. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Information Processing Systems, 2022. URLhttps://openreview.net/forum?id=_3ELRdg2sgI
work page 2022
-
[24]
Reinforced Self-Training (ReST) for Language Modeling
Caglar Gulcehre, Tom Le Paine, Srivatsan Srinivasan, Ksenia Konyushkova, Lotte Weerts, Abhishek Sharma, Aditya Siddhant, Alex Ahern, Miaosen Wang, Chenjie Gu, Wolfgang Macherey, Arnaud Doucet, Orhan Firat, and Nando de Freitas. Reinforced self-training (rest) for language modeling, 2023. URL https://arxiv.org/abs/2308.08998
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[25]
Thinking fast and slow with deep learning and tree search
Thomas Anthony, Zheng Tian, and David Barber. Thinking fast and slow with deep learning and tree search. InProceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 5366–5376, Red Hook, NY , USA, 2017. Curran Associates Inc. ISBN 9781510860964
work page 2017
-
[26]
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, and Demis Hassabis. Mastering chess and shogi by self-play with a general reinforcement learning algorithm, 2017. URLhttps://arxiv.org/abs/1712.01815
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[27]
Autort: Embodied foundation models for large scale orchestration of robotic agents, 2024
Michael Ahn, Debidatta Dwibedi, Chelsea Finn, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Karol Hausman, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Sean Kirmani, Isabel Leal, Edward Lee, Sergey Levine, Yao Lu, Isabel Leal, Sharath Maddineni, Kanishka Rao, Dorsa Sadigh, Pannag Sanketi, Pierre Sermanet, Quan Vuong, Stefan Welker, Fei Xia, Te...
work page 2024
-
[28]
Minedojo: Building open-ended embodied agents with internet-scale knowledge
Linxi Fan, Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, and Anima Anandkumar. Minedojo: Building open-ended embodied agents with internet-scale knowledge. InThirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022. URLhttps://openreview.net/forum?id=rc8...
work page 2022
-
[29]
Decision transformer: Reinforcement learning via sequence modeling,
Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, and Igor Mordatch. Decision transformer: Reinforcement learning via sequence modeling,
- [30]
-
[31]
Yuhuai Wu, Markus Norman Rabe, DeLesley Hutchins, and Christian Szegedy. Memorizing transformers. InInternational Conference on Learning Representations, 2022. URL https://openreview.net/ forum?id=TrjbxzRcnf-
work page 2022
-
[32]
Aydar Bulatov, Yuri Kuratov, and Mikhail Burtsev. Recurrent memory transformer. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Information Processing Systems, 2022. URLhttps://openreview.net/forum?id=Uynr3iPhksa
work page 2022
-
[33]
Improving language models by retrieving from trillions of tokens
Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego De Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, ...
work page 2022
-
[34]
Voyager: An Open-Ended Embodied Agent with Large Language Models
Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. V oyager: An open-ended embodied agent with large language models, 2023. URL https://arxiv.org/abs/2305.16291
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[35]
Playing Atari with Deep Reinforcement Learning
V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning, 2013. URL https://arxiv.org/ abs/1312.5602
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[36]
Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, Michaël Mathieu, Andrew Joseph Dudzik, Junyoung Chung, David Choi, Richard Powell, Timo Ewalds, Petko Georgiev, Junhyuk Oh, Dan Hor- gan, Manuel Kroiss, Ivo Danihelka, Aja Huang, L. Sifre, Trevor Cai, John P. Agapiou, Max Jader- berg, Alexander Sasha Vezhnevets, Rémi Leblond, Tobias Pohlen, Valentin D...
work page 2019
-
[37]
OpenAI, :, Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław D˛ ebiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique P. d. O. Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ily...
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[38]
The pokeagent challenge: Competitive and long-context learning at scale
Seth Karten, Jake Grigsby, Stephanie Milani, Kiran V odrahalli, Amy Zhang, Fei Fang, Yuke Zhu, and Chi Jin. The pokeagent challenge: Competitive and long-context learning at scale. InNeurIPS Competition Track, April 2025
work page 2025
-
[39]
ClaudePlaysPokemon. ClaudePlaysPokemon. https://www.twitch.tv/claudeplayspokemon, 2026. [Accessed 01-05-2026]
work page 2026
-
[40]
Sawyer, Daniel Slater, David Reichert, Davide Vercelli, Demis Hassabis, Drew A
SIMA team, Adrian Bolton, Alexander Lerchner, Alexandra Cordell, Alexandre Moufarek, Andrew Bolt, Andrew Lampinen, Anna Mitenkova, Arne Olav Hallingstad, Bojan Vujatovic, Bonnie Li, Cong Lu, Daan Wierstra, Daniel P. Sawyer, Daniel Slater, David Reichert, Davide Vercelli, Demis Hassabis, Drew A. Hudson, Duncan Williams, Ed Hirst, Fabio Pardo, Felix Hill, F...
-
[41]
Dungeons and data: A large-scale nethack dataset, 2023
Eric Hambro, Roberta Raileanu, Danielle Rothermel, Vegard Mella, Tim Rocktäschel, Heinrich Küttler, and Naila Murray. Dungeons and data: A large-scale nethack dataset, 2023. URL https://arxiv.org/ abs/2211.00539
-
[42]
Ricardo J. G. B. Campello, Davoud Moulavi, and Joerg Sander. Density-based clustering based on hierarchical density estimates. In Jian Pei, Vincent S. Tseng, Longbing Cao, Hiroshi Motoda, and Guandong Xu, editors,Advances in Knowledge Discovery and Data Mining, pages 160–172, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg. ISBN 978-3-642-37456-2
work page 2013
-
[43]
Sigmoid loss for language image pre-training, 2023
Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. Sigmoid loss for language image pre-training, 2023. URLhttps://arxiv.org/abs/2303.15343
-
[44]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, ...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[45]
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick La...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[46]
Walkthrough:pokémon emerald, Jun 2025
Bulbapedia. Walkthrough:pokémon emerald, Jun 2025. URL https://bulbapedia.bulbagarden. net/wiki/Walkthrough:Pok%C3%A9mon_Emerald
work page 2025
-
[47]
Zelda Dungeon. The minish cap walkthrough. https://www.zeldadungeon.net/ the-minish-cap-walkthrough/, 2026. [Accessed 30-04-2026]
work page 2026
-
[48]
Sebastian Raschka, Joshua Patterson, and Corey Nolet. Machine learning in python: Main developments and technology trends in data science, machine learning, and artificial intelligence.arXiv preprint arXiv:2002.04803, 2020
-
[49]
A comprehensive survey of forgetting in deep learning beyond continual learning.IEEE Trans
Zhenyi Wang, Enneng Yang, Li Shen, and Heng Huang. A comprehensive survey of forgetting in deep learning beyond continual learning.IEEE Trans. Pattern Anal. Mach. Intell., 47(3):1464–1483, March
-
[50]
doi: 10.1109/TPAMI.2024.3498346
ISSN 0162-8828. doi: 10.1109/TPAMI.2024.3498346. URL https://doi.org/10.1109/TPAMI. 2024.3498346
-
[51]
Decoupled Weight Decay Regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization, 2019. URLhttps://arxiv. org/abs/1711.05101. 13 A ASH Playthrough Examples Examples of how ASH usesdynamic bootstrappingandlong-memoryto overcome roadblocks. 1 2 4 5 6 3 Figure 6:Dynamic bootstrapping example.To complete milestone 2, the player must rescue the Professor from a wild Zi...
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[52]
Are all frameskey moments?
-
[53]
Do all frames correspond to thesamekey moment? If the answer to both questions is yes, we consider the cluster to be a key moment. Based on this analysis, we report that 48% of clusters identified by HDBSCAN [40] correspond to key moments. The effectiveness of HDBSCAN in identifying these key moments helps explain the performance increase observed in Sect...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.