Mentor: Mixture-of-experts network with task-oriented perturbation for visual reinforcement learning

· 2024 · arXiv 2410.14972

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

SARM2: Multi-Task Stage Aware Reward Modeling for Self Improving Robotic Manipulation

cs.RO · 2026-06-09 · unverdicted · novelty 6.0

SARM2 presents RM, a multi-task stage-aware reward model achieving 80% lower value-estimation MSE, which when used in SPIRAL boosts manipulation task success from ~50% to near-perfect on several benchmarks.

Breaking Lock-In: Preserving Steerability under Low-Data VLA Post-Training

cs.RO · 2026-04-25 · unverdicted · novelty 6.0

DeLock mitigates lock-in in low-data VLA post-training via visual grounding preservation and test-time contrastive prompt guidance, outperforming baselines across eight evaluations while matching data-heavy generalist policies.

DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

cs.CV · 2025-05-22 · unverdicted · novelty 6.0

DriveMoE applies scene-specialized Vision MoE and skill-specialized Action MoE to a VLA baseline to achieve SOTA closed-loop performance on Bench2Drive.

DyGRO-VLA: Cross-Task Scaling of Vision-Language-Action Models via Dynamic Grouped Residual Optimization

cs.RO · 2026-05-17 · unverdicted · novelty 5.0

DyGRO-VLA is a two-stage optimization framework for cross-task scaling of Vision-Language-Action models via dynamic grouped residual optimization in RL.

GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics

cs.RO · 2025-06-16 · unverdicted · novelty 4.0

GRaD-Nav++ combines 3D Gaussian Splatting simulation and differentiable RL to train an onboard VLA policy that achieves 50-83% success on language-guided drone navigation tasks in simulation and real hardware.

citing papers explorer

Showing 5 of 5 citing papers after filters.

SARM2: Multi-Task Stage Aware Reward Modeling for Self Improving Robotic Manipulation cs.RO · 2026-06-09 · unverdicted · none · ref 25
SARM2 presents RM, a multi-task stage-aware reward model achieving 80% lower value-estimation MSE, which when used in SPIRAL boosts manipulation task success from ~50% to near-perfect on several benchmarks.
Breaking Lock-In: Preserving Steerability under Low-Data VLA Post-Training cs.RO · 2026-04-25 · unverdicted · none · ref 47
DeLock mitigates lock-in in low-data VLA post-training via visual grounding preservation and test-time contrastive prompt guidance, outperforming baselines across eight evaluations while matching data-heavy generalist policies.
DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving cs.CV · 2025-05-22 · unverdicted · none · ref 42
DriveMoE applies scene-specialized Vision MoE and skill-specialized Action MoE to a VLA baseline to achieve SOTA closed-loop performance on Bench2Drive.
DyGRO-VLA: Cross-Task Scaling of Vision-Language-Action Models via Dynamic Grouped Residual Optimization cs.RO · 2026-05-17 · unverdicted · none · ref 99
DyGRO-VLA is a two-stage optimization framework for cross-task scaling of Vision-Language-Action models via dynamic grouped residual optimization in RL.
GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics cs.RO · 2025-06-16 · unverdicted · none · ref 26
GRaD-Nav++ combines 3D Gaussian Splatting simulation and differentiable RL to train an onboard VLA policy that achieves 50-83% success on language-guided drone navigation tasks in simulation and real hardware.

Mentor: Mixture-of-experts network with task-oriented perturbation for visual reinforcement learning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer