StepGuard: Guarding Web Navigation via Single-Step Calibration
Pith reviewed 2026-06-27 01:31 UTC · model grok-4.3
The pith
StepGuard uses dynamic policy switching and confidence-triggered reflection to improve web navigation accuracy and set new benchmarks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By integrating Dynamic Dual-Policy Optimization (DDPO), which dynamically switches between navigation-first and answer-first policies, and Confidence-Guided Adaptive Navigation Reflection (CANR), which triggers reflection based on per-step confidence and uses contrastive rewards for self-correction, StepGuard calibrates single-step inaccuracies and mitigates reward entanglement in web navigation agents.
What carries the argument
Dynamic Dual-Policy Optimization (DDPO) combined with Confidence-Guided Adaptive Navigation Reflection (CANR) to form the StepGuard framework for single-step calibration.
If this is right
- Navigation agents achieve higher accuracy by avoiding entangled rewards through policy mode switching.
- Selective reflection reduces unnecessary computations while encouraging self-correction.
- Contrastive rewards promote better decision making in uncertain steps.
- The approach establishes new state-of-the-art results on web navigation benchmarks.
Where Pith is reading between the lines
- Similar dual-policy and adaptive reflection techniques could apply to other agent tasks involving sequential actions and feedback.
- Reducing hyperparameter tuning needs might make such agents more practical for real-world deployment.
- Future work could test if the confidence estimation generalizes across different vision-language models.
Load-bearing premise
That the dynamic switching and selective reflection mechanisms will resolve reward entanglement and single-step fragility without creating new failure modes.
What would settle it
Running the StepGuard agent on the standard benchmarks and observing no significant improvement in navigation or answer accuracy compared to prior methods.
Figures
read the original abstract
Web navigation requires agents to follow natural language goals, interact with web pages, and produce accurate answers. While recent advances leverage vision-language models and reinforcement learning, existing methods still suffer from single-step fragility due to reward misalignment and error propagation. To tackle the reward entanglement, we design Dynamic Dual-Policy Optimization (DDPO), which dynamically switches between a navigation-first mode for exploration and an answer-first mode for question-answering to mitigate reward conflict. To calibrate the single-step error, we propose Confidence-Guided Adaptive Navigation Reflection (CANR), a mechanism that estimates per-step confidence, triggers reflection only when necessary, and uses contrastive rewards to encourage self-correction to calibrate the single-step inaccuracy. With the above as the main components, we finally develop our StepGuard, a new framework of Guarding Web Navigation via Single-Step Calibration. Experiments demonstrate that our approach significantly improves navigation and answer accuracy, setting new state-of-the-art performance on standard web navigation benchmarks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes StepGuard, a framework for web navigation agents based on vision-language models. It introduces Dynamic Dual-Policy Optimization (DDPO) to dynamically switch between navigation-first and answer-first modes to mitigate reward entanglement, and Confidence-Guided Adaptive Navigation Reflection (CANR) to estimate per-step confidence, selectively trigger reflection, and apply contrastive rewards for self-correction. The central claim is that these components resolve single-step fragility and reward misalignment, leading to significant gains in navigation and answer accuracy with new state-of-the-art results on standard web navigation benchmarks.
Significance. If the performance claims hold with rigorous validation, the work could offer a practical way to stabilize RL training for VLM-based web agents by decoupling conflicting objectives and enabling targeted self-correction, potentially benefiting downstream applications in automated browsing and question answering.
major comments (2)
- [Abstract] Abstract: the claim that the approach 'significantly improves navigation and answer accuracy, setting new state-of-the-art performance on standard web navigation benchmarks' is presented without any reference to specific benchmarks, baseline methods, evaluation metrics, ablation studies, or statistical tests. This is load-bearing for the central empirical claim.
- [Abstract] Abstract (and implied methods): no equations, pseudocode, or threshold definitions are supplied for the DDPO mode-switching logic or the CANR confidence estimation and contrastive reward formulation, preventing assessment of whether these mechanisms avoid the instability or extra tuning issues raised by the skeptic note.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment point by point below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that the approach 'significantly improves navigation and answer accuracy, setting new state-of-the-art performance on standard web navigation benchmarks' is presented without any reference to specific benchmarks, baseline methods, evaluation metrics, ablation studies, or statistical tests. This is load-bearing for the central empirical claim.
Authors: We agree that the abstract would be strengthened by greater specificity on the empirical claims. The body of the manuscript reports results on WebArena and Mind2Web using success rate and answer accuracy metrics, with comparisons to baselines including ReAct and WebAgent, plus ablation studies and statistical significance testing. We will revise the abstract to reference the primary benchmarks and metrics. revision: yes
-
Referee: [Abstract] Abstract (and implied methods): no equations, pseudocode, or threshold definitions are supplied for the DDPO mode-switching logic or the CANR confidence estimation and contrastive reward formulation, preventing assessment of whether these mechanisms avoid the instability or extra tuning issues raised by the skeptic note.
Authors: Abstracts are high-level summaries and do not contain equations or pseudocode; these appear in Section 3, with the DDPO switching condition defined via a dynamic threshold and CANR using per-step confidence (via normalized logits) plus the explicit contrastive reward term. This formulation is designed to limit extra hyperparameters. If the methods section presentation requires clarification for full assessment, we can expand the pseudocode or threshold definitions. revision: partial
Circularity Check
No derivation chain or equations present; no circularity detected
full rationale
The provided abstract and description introduce DDPO and CANR as new mechanisms for web navigation but contain no equations, mathematical derivations, fitted parameters presented as predictions, or self-citations invoked as load-bearing uniqueness theorems. Without any claimed derivation chain that reduces to inputs by construction, the paper's central claims rest on empirical description rather than self-referential logic. This is the expected honest non-finding when no formal steps exist to inspect.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Aho and Jeffrey D
Alfred V. Aho and Jeffrey D. Ullman , title =. 1972
1972
-
[2]
Publications Manual , year = "1983", publisher =
1983
-
[3]
Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243
-
[4]
Scalable training of
Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of
-
[5]
Dan Gusfield , title =. 1997
1997
-
[6]
Tetreault , title =
Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =
2015
-
[7]
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =
Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =
-
[8]
Shuai Bai and Keqin Chen and Xuejing Liu and Jialin Wang and Wenbin Ge and Sibo Song and Kai Dang and Peng Wang and Shijie Wang and Jun Tang and Humen Zhong and Yuanzhi Zhu and Ming. Qwen2.5-VL Technical Report , journal =. 2025 , url =. doi:10.48550/ARXIV.2502.13923 , eprinttype =. 2502.13923 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.13923 2025
-
[9]
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen and Jiannan Wu and Wenhai Wang and Weijie Su and Guo Chen and Sen Xing and Muyan Zhong and Qinglong Zhang and Xizhou Zhu and Lewei Lu and Bin Li and Ping Luo and Tong Lu and Yu Qiao and Jifeng Dai , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2312.14238 , eprinttype =. 2312.14238 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2312.14238 2023
-
[10]
6th International Conference on Learning Representations,
Evan Zheran Liu and Kelvin Guu and Panupong Pasupat and Tianlin Shi and Percy Liang , title =. 6th International Conference on Learning Representations,. 2018 , url =
2018
-
[11]
Landay and Monica Lam , editor =
Nancy Xu and Sam Masling and Michael Du and Giovanni Campagna and Larry Heck and James A. Landay and Monica Lam , editor =. Grounding Open-Domain Instructions to Automate Web Support Tasks , booktitle =. 2021 , url =. doi:10.18653/V1/2021.NAACL-MAIN.80 , timestamp =
-
[12]
Sahisnu Mazumder and Oriana Riva , editor =. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,. 2021 , url =. doi:10.18653/V1/2021.NAACL-MAIN.222 , timestamp =
-
[13]
Yingshan Chang and Yonatan Bisk , editor =. WebQA:. NeurIPS 2021 Competitions and Demonstrations Track, 6-14 December 2021, Online , series =. 2021 , url =
2021
-
[14]
ScreenQA: Large-Scale Question-Answer Pairs Over Mobile App Screenshots , booktitle =
Yu. ScreenQA: Large-Scale Question-Answer Pairs Over Mobile App Screenshots , booktitle =. 2025 , url =. doi:10.18653/V1/2025.NAACL-LONG.477 , timestamp =
-
[15]
Mind2Web: Towards a Generalist Agent for the Web , booktitle =
Xiang Deng and Yu Gu and Boyuan Zheng and Shijie Chen and Samual Stevens and Boshi Wang and Huan Sun and Yu Su , editor =. Mind2Web: Towards a Generalist Agent for the Web , booktitle =. 2023 , url =
2023
-
[16]
Xu and Hao Zhu and Xuhui Zhou and Robert Lo and Abishek Sridhar and Xianyi Cheng and Tianyue Ou and Yonatan Bisk and Daniel Fried and Uri Alon and Graham Neubig , title =
Shuyan Zhou and Frank F. Xu and Hao Zhu and Xuhui Zhou and Robert Lo and Abishek Sridhar and Xianyi Cheng and Tianyue Ou and Yonatan Bisk and Daniel Fried and Uri Alon and Graham Neubig , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =
2024
-
[17]
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents , booktitle =
Shunyu Yao and Howard Chen and John Yang and Karthik Narasimhan , editor =. WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents , booktitle =. 2022 , url =
2022
-
[18]
Forty-first International Conference on Machine Learning,
Boyuan Zheng and Boyu Gou and Jihyung Kil and Huan Sun and Yu Su , title =. Forty-first International Conference on Machine Learning,. 2024 , url =
2024
-
[19]
The Twelfth International Conference on Learning Representations,
Longtao Zheng and Rundong Wang and Xinrun Wang and Bo An , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =
2024
-
[20]
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning , booktitle =
Hao Bai and Yifei Zhou and Jiayi Pan and Mert Cemri and Alane Suhr and Sergey Levine and Aviral Kumar , editor =. DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning , booktitle =. 2024 , url =
2024
-
[21]
Zehui Chen and Kuikun Liu and Qiuchen Wang and Wenwei Zhang and Jiangning Liu and Dahua Lin and Kai Chen and Feng Zhao , editor =. Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models , booktitle =. 2024 , url =. doi:10.18653/V1/2024.FINDINGS-ACL.557 , timestamp =
-
[22]
The Thirteenth International Conference on Learning Representations,
Zehan Qi and Xiao Liu and Iat Long Iong and Hanyu Lai and Xueqiao Sun and Jiadai Sun and Xinyue Yang and Yu Yang and Shuntian Yao and Wei Xu and Jie Tang and Yuxiao Dong , title =. The Thirteenth International Conference on Learning Representations,. 2025 , url =
2025
-
[23]
Hanyu Lai and Xiao Liu and Iat Long Iong and Shuntian Yao and Yuxuan Chen and Pengbo Shen and Hao Yu and Hanchen Zhang and Xiaohan Zhang and Yuxiao Dong and Jie Tang , editor =. AutoWebGLM:. Proceedings of the 30th. 2024 , url =. doi:10.1145/3637528.3671620 , timestamp =
-
[24]
Reiichiro Nakano and Jacob Hilton and Suchir Balaji and Jeff Wu and Long Ouyang and Christina Kim and Christopher Hesse and Shantanu Jain and Vineet Kosaraju and William Saunders and Xu Jiang and Karl Cobbe and Tyna Eloundou and Gretchen Krueger and Kevin Button and Matthew Knight and Benjamin Chess and John Schulman , title =. CoRR , volume =. 2021 , url...
Pith/arXiv arXiv 2021
-
[25]
Inioluwa Deborah Raji and Roel Dobbe , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2401.10899 , eprinttype =. 2401.10899 , timestamp =
-
[26]
Xu and Luyu Gao and Zhiqing Sun and Qian Liu and Jane Dwivedi
Zhengbao Jiang and Frank F. Xu and Luyu Gao and Zhiqing Sun and Qian Liu and Jane Dwivedi. Active Retrieval Augmented Generation , booktitle =. 2023 , url =. doi:10.18653/V1/2023.EMNLP-MAIN.495 , timestamp =
-
[27]
URL https: //doi.org/10.1007/s10458-022-09552-y
Conor F. Hayes and Roxana Radulescu and Eugenio Bargiacchi and Johan K. A practical guide to multi-objective reinforcement learning and planning , journal =. 2022 , url =. doi:10.1007/S10458-022-09552-Y , timestamp =
-
[28]
Reflexion: language agents with verbal reinforcement learning , booktitle =
Noah Shinn and Federico Cassano and Ashwin Gopinath and Karthik Narasimhan and Shunyu Yao , editor =. Reflexion: language agents with verbal reinforcement learning , booktitle =. 2023 , url =
2023
-
[29]
Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models , booktitle =
Andy Zhou and Kai Yan and Michal Shlapentokh. Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models , booktitle =. 2024 , url =
2024
-
[30]
PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change , booktitle =
Karthik Valmeekam and Matthew Marquez and Alberto Olmo Hernandez and Sarath Sreedharan and Subbarao Kambhampati , editor =. PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change , booktitle =. 2023 , url =
2023
-
[31]
doi: 10.18653/v1/2023.emnlp-main.330
Katherine Tian and Eric Mitchell and Allan Zhou and Archit Sharma and Rafael Rafailov and Huaxiu Yao and Chelsea Finn and Christopher D. Manning , editor =. Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback , booktitle =. 2023 , url =. doi:10.18653/V1/2023.EMNLP-MAIN.330 , t...
-
[32]
Le and Ed H
Xuezhi Wang and Jason Wei and Dale Schuurmans and Quoc V. Le and Ed H. Chi and Sharan Narang and Aakanksha Chowdhery and Denny Zhou , title =. The Eleventh International Conference on Learning Representations,. 2023 , url =
2023
-
[33]
The Twelfth International Conference on Learning Representations,
Xiao Liu and Hao Yu and Hanchen Zhang and Yifan Xu and Xuanyu Lei and Hanyu Lai and Yu Gu and Hangliang Ding and Kaiwen Men and Kejuan Yang and Shudan Zhang and Xiang Deng and Aohan Zeng and Zhengxiao Du and Chenhui Zhang and Sheng Shen and Tianjun Zhang and Yu Su and Huan Sun and Minlie Huang and Yuxiao Dong and Jie Tang , title =. The Twelfth Internatio...
2024
-
[34]
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
Jing Yu Koh and Robert Lo and Lawrence Jang and Vikram Duvvur and Ming Chong Lim and Po. VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks , booktitle =. 2024 , url =. doi:10.18653/V1/2024.ACL-LONG.50 , timestamp =
-
[35]
CoRR , year =
OpenAI , title =. CoRR , year =
-
[36]
Keen You and Haotian Zhang and Eldon Schoop and Floris Weers and Amanda Swearngin and Jeffrey Nichols and Yinfei Yang and Zhe Gan , editor =. Ferret-UI: Grounded Mobile. Computer Vision -. 2024 , url =. doi:10.1007/978-3-031-73039-9\_14 , timestamp =
-
[37]
The Thirteenth International Conference on Learning Representations,
Zhangheng Li and Keen You and Haotian Zhang and Di Feng and Harsh Agrawal and Xiujun Li and Mohana Prasad Sathya Moorthy and Jeffrey Nichols and Yinfei Yang and Zhe Gan , title =. The Thirteenth International Conference on Learning Representations,. 2025 , url =
2025
-
[38]
Jianqiang Wan and Sibo Song and Wenwen Yu and Yuliang Liu and Wenqing Cheng and Fei Huang and Xiang Bai and Cong Yao and Zhibo Yang , title =. 2024 , url =. doi:10.1109/CVPR52733.2024.01481 , timestamp =
-
[39]
Wenwen Yu and Zhibo Yang and Jianqiang Wan and Sibo Song and Jun Tang and Wenqing Cheng and Yuliang Liu and Xiang Bai , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2502.16161 , eprinttype =. 2502.16161 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.16161 2025
-
[40]
AppAgent: Multimodal Agents as Smartphone Users , booktitle =
Chi Zhang and Zhao Yang and Jiaxuan Liu and Yanda Li and Yucheng Han and Xin Chen and Zebiao Huang and Bin Fu and Gang Yu , editor =. AppAgent: Multimodal Agents as Smartphone Users , booktitle =. 2025 , url =. doi:10.1145/3706598.3713600 , timestamp =
-
[41]
Sumers and Shunyu Yao and Karthik Narasimhan and Thomas L
Theodore R. Sumers and Shunyu Yao and Karthik Narasimhan and Thomas L. Griffiths , title =. Trans. Mach. Learn. Res. , volume =. 2024 , url =
2024
-
[42]
Andrew Zhao and Daniel Huang and Quentin Xu and Matthieu Lin and Yong. ExpeL:. Thirty-Eighth. 2024 , url =. doi:10.1609/AAAI.V38I17.29936 , timestamp =
-
[43]
Webvoyager: Building an end-to-end web agent with large multimodal models
Hongliang He and Wenlin Yao and Kaixin Ma and Wenhao Yu and Yong Dai and Hongming Zhang and Zhenzhong Lan and Dong Yu , editor =. WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models , booktitle =. 2024 , url =. doi:10.18653/V1/2024.ACL-LONG.371 , timestamp =
-
[44]
The Thirteenth International Conference on Learning Representations,
Saaket Agashe and Jiuzhou Han and Shuyu Gan and Jiachen Yang and Ang Li and Xin Eric Wang , title =. The Thirteenth International Conference on Learning Representations,. 2025 , url =
2025
-
[45]
AutoGuide: Automated Generation and Selection of Context-Aware Guidelines for Large Language Model Agents , booktitle =
Yao Fu and Dong. AutoGuide: Automated Generation and Selection of Context-Aware Guidelines for Large Language Model Agents , booktitle =. 2024 , url =
2024
-
[46]
The Twelfth International Conference on Learning Representations,
Zhibin Gou and Zhihong Shao and Yeyun Gong and Yelong Shen and Yujiu Yang and Nan Duan and Weizhu Chen , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =
2024
-
[47]
Self-Refine: Iterative Refinement with Self-Feedback , booktitle =
Aman Madaan and Niket Tandon and Prakhar Gupta and Skyler Hallinan and Luyu Gao and Sarah Wiegreffe and Uri Alon and Nouha Dziri and Shrimai Prabhumoye and Yiming Yang and Shashank Gupta and Bodhisattwa Prasad Majumder and Katherine Hermann and Sean Welleck and Amir Yazdanbakhsh and Peter Clark , editor =. Self-Refine: Iterative Refinement with Self-Feedb...
2023
-
[48]
Agent Lumos: Unified and Modular Training for Open-Source Language Agents , booktitle =
Da Yin and Faeze Brahman and Abhilasha Ravichander and Khyathi Raghavi Chandu and Kai. Agent Lumos: Unified and Modular Training for Open-Source Language Agents , booktitle =. 2024 , url =. doi:10.18653/V1/2024.ACL-LONG.670 , timestamp =
-
[49]
A gent T uning: Enabling Generalized Agent Abilities for LLM s
Aohan Zeng and Mingdao Liu and Rui Lu and Bowen Wang and Xiao Liu and Yuxiao Dong and Jie Tang , editor =. AgentTuning: Enabling Generalized Agent Abilities for LLMs , booktitle =. 2024 , url =. doi:10.18653/V1/2024.FINDINGS-ACL.181 , timestamp =
-
[50]
AutoAct: Automatic Agent Learning from Scratch for
Shuofei Qiao and Ningyu Zhang and Runnan Fang and Yujie Luo and Wangchunshu Zhou and Yuchen Eleanor Jiang and Chengfei Lv and Huajun Chen , editor =. AutoAct: Automatic Agent Learning from Scratch for. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),. 2024 , url =. doi:10.18653/V1/2024.ACL-LO...
-
[51]
The Thirteenth International Conference on Learning Representations,
Yougang Lyu and Lingyong Yan and Zihan Wang and Dawei Yin and Pengjie Ren and Maarten de Rijke and Zhaochun Ren , title =. The Thirteenth International Conference on Learning Representations,. 2025 , url =
2025
-
[52]
Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2402.03300 , eprinttype =. 2402.03300 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.03300 2024
-
[53]
Hanlin Wang and Chak Tou Leong and Jiashuo Wang and Jian Wang and Wenjie Li , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2505.20732 , eprinttype =. 2505.20732 , timestamp =
-
[54]
WebVLN: Vision-and-Language Navigation on Websites , booktitle =
Qi Chen and Dileepa Pitawela and Chongyang Zhao and Gengze Zhou and Hsiang. WebVLN: Vision-and-Language Navigation on Websites , booktitle =. 2024 , url =. doi:10.1609/AAAI.V38I2.27878 , timestamp =
-
[55]
WebWalker: Benchmarking LLMs in Web Traversal , booktitle =
Jialong Wu and Wenbiao Yin and Yong Jiang and Zhenglin Wang and Zekun Xi and Runnan Fang and Linhai Zhang and Yulan He and Deyu Zhou and Pengjun Xie and Fei Huang , editor =. WebWalker: Benchmarking LLMs in Web Traversal , booktitle =. 2025 , url =
2025
-
[56]
Yicong Hong and Qi Wu and Yuankai Qi and Cristian Rodriguez Opazo and Stephen Gould , title =. 2021 , url =. doi:10.1109/CVPR46437.2021.00169 , timestamp =
-
[57]
Hao Tan and Mohit Bansal , editor =. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing,. 2019 , url =. doi:10.18653/V1/D19-1514 , timestamp =
-
[58]
Multimodal Web Navigation with Instruction-Finetuned Foundation Models , booktitle =
Hiroki Furuta and Kuang. Multimodal Web Navigation with Instruction-Finetuned Foundation Models , booktitle =. 2024 , url =
2024
-
[59]
The Twelfth International Conference on Learning Representations,
Tri Dao , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =
2024
-
[60]
Yuze Zhao and Jintao Huang and Jinghan Hu and Xingjun Wang and Yunlin Mao and Daoze Zhang and Zeyinzi Jiang and Zhikai Wu and Baole Ai and Ang Wang and Wenmeng Zhou and Yingda Chen , editor =. AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25 - March 4, 2025, Philadelphia, PA,. 2025 , url =. doi:10.1609/AAAI...
-
[61]
Generalized Slow Roll for Tensors
Samyam Rajbhandari and Jeff Rasley and Olatunji Ruwase and Yuxiong He , editor =. ZeRO: memory optimizations toward training trillion parameter models , booktitle =. 2020 , url =. doi:10.1109/SC41405.2020.00024 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1109/sc41405.2020.00024 2020
-
[62]
7th International Conference on Learning Representations,
Ilya Loshchilov and Frank Hutter , title =. 7th International Conference on Learning Representations,. 2019 , url =
2019
-
[63]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Navgpt: Explicit reasoning in vision-and-language navigation with large language models , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=. 2024 , url =
2024
-
[64]
Narasimhan and Yuan Cao , title =
Shunyu Yao and Jeffrey Zhao and Dian Yu and Nan Du and Izhak Shafran and Karthik R. Narasimhan and Yuan Cao , title =. The Eleventh International Conference on Learning Representations,. 2023 , url =
2023
-
[65]
Jinhao Duan and Hao Cheng and Shiqi Wang and Alex Zavalny and Chenan Wang and Renjing Xu and Bhavya Kailkhura and Kaidi Xu , editor =. Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models , booktitle =. 2024 , url =. doi:10.18653/V1/2024.ACL-LONG.276 , timestamp =
-
[66]
Gradient Surgery for Multi-Task Learning , booktitle =
Tianhe Yu and Saurabh Kumar and Abhishek Gupta and Sergey Levine and Karol Hausman and Chelsea Finn , editor =. Gradient Surgery for Multi-Task Learning , booktitle =. 2020 , url =
2020
-
[67]
Multi-Task Learning as Multi-Objective Optimization , booktitle =
Ozan Sener and Vladlen Koltun , editor =. Multi-Task Learning as Multi-Objective Optimization , booktitle =. 2018 , url =
2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.