{"total":10,"items":[{"citing_arxiv_id":"2605.16520","ref_index":120,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Global Convergence of Sampling-Based Nonconvex Optimization through Diffusion-Style Smoothing","primary_cat":"cs.LG","submitted_at":"2026-05-15T18:14:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.","context_count":1,"top_context_role":"background","top_context_polarity":"unclear","context_text":"n are constants depending only onn. Proof.We begin with the triangle inequality: for anyx,y∈Randn∈N, (x+y) n≤2n−1(xn +yn)(118) Applying this to the central moment ofxf(x): E[|xf(x)−E[xf(x)]|2n]≤22n−1(E[|xf(x)|2n] +|E[xf(x)]|2n)(119) For the first term, we use the fact thatfisL-Lipschitz, which means|f(x)−f(0)|≤L|x|. This implies: |f(x)|≤|f(0)|+L|x|(120) |xf(x)|≤|x|·|f(x)|≤|x|·(|f(0)|+L|x|) =|f(0)||x|+Lx2 (121) Therefore: (xf(x))2n≤22n−1(|f(0)|2n|x|2n +L 2nx4n)(122) E[|xf(x)|2n]≤22n−1(|f(0)|2nE[|x|2n] +L 2nE[x4n])(123) Since x∼N(0,t ), we know thatE[|x|2n] =Cntn and E[x4n] =C2nt2n for some constantsCn,C 2n depending only onn. Thus: E[|xf(x)|2n]≤22n−1(|f(0)|2nCntn +L 2nC2nt2n)(124) =C′ ntn +C′ 2nL2nt2n (125)"},{"citing_arxiv_id":"2605.09595","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Neuromorphic Reinforcement Learning for Quadruped Locomotion Control on Uneven Terrain","primary_cat":"cs.NE","submitted_at":"2026-05-10T15:16:07+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"However, here the PPO algorithm requires its objective to be maximized. Therefore, we can apply the negative of the PPO objective gradient w.r.t. the network output, shown in Equation 4, directly to the EP output layer, which yields an output layer dynamics: dξt,out,i d[time] =−ξ t,out,i +ρ ′(ξt,out,i) X j wout,ijρ(ξt,j) +b out,iρ′(ξt,out,i)(9) −β· 1 |B| ( 1[0,1+ϵ)(rt,nudging(ξt,out))· (at,i−ξt,out,i) σ2 i ˆAt if ˆAt ≥0 1(1−ϵ,∞)(rt,nudging(ξt,out))· (at,i−ξt,out,i) σ2 i ˆAt if ˆAt <0 Here rt,nudging(ξt,out) is the nudging probability ratio, which is dedicated to the nudge phase per- relaxation-iteration probability ratio calculation: rt,nudging(ξt,out) = πnudging(at|st) πrollout(at|st) (10) = 1 πrollout(at|st) DactionY i \" 1p 2πσ 2 i exp \u0012 −(at,i −ξ t,out,i)2 2σ2 i \u0013# 6 Here πnudging(at|st) is the action probability in the nudge relaxation as a function ofξt,out,i. Daction is the dimensionality of action space. The reason for using the notation rt,nudging(ξt,out) instead of rt(ξt,out) is that the probability ratio keeps changing/oscillating in both nudge phase iterations because ξt,out keeps changing/oscillating, and this nudging probability ratio controls the gradient mask in each relaxation step. However, experiments show that this formulation fails to converge when using Equation 9 as the objective gradient. The per-update KL-divergence graph indicates excessively large update steps. Upon investigation of the cause of the large update steps, we conclude that, for positive-advantage samples, the positive nudge phase will drive the output neuron state toward the farther-away-from-targetdirection without bound. For a detailed discussion about the cause of the large update step using the original PPO objective gradient, see Appendix C. To constrain these large update steps, we propose the two-sided PPO ratio clip objective gradient: ∂LTwoSidedCLIP ∂ξt,out,i = 1 |B| ( 1(1−ϵrev,1+ϵ)(rt,nudging(ξt,out))· (at,i−ξt,out,i) σi ˆAt if ˆAt ≥0 1(1−ϵ,1+ϵrev)(rt,nudgin"},{"citing_arxiv_id":"2605.04607","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Right Model, Right Time: Real-Time Cascaded-Fidelity MPC for Bipedal Walking","primary_cat":"cs.RO","submitted_at":"2026-05-06T07:54:49+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.04185","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Constraint-Enhanced Reinforcement Learning Based on Dynamic Decoupled Spherical Radial Squashing","primary_cat":"cs.LG","submitted_at":"2026-05-05T18:24:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DD-SRad is a new RL constraint technique that adapts per-actuator radii dynamically to achieve zero violations and unconstrained-level task performance on heterogeneous robotic joints.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.19344","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Quadruped Parkour Learning: Sparsely Gated Mixture of Experts with Visual Input","primary_cat":"cs.RO","submitted_at":"2026-04-21T11:27:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Sparsely gated MoE policies double the success rate of a real Unitree Go2 quadruped on large-obstacle parkour versus matched-active-parameter MLP baselines while cutting inference time compared with a scaled-up MLP.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.10351","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Trajectory-based actuator identification via differentiable simulation","primary_cat":"cs.RO","submitted_at":"2026-04-11T21:36:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Differentiable simulation enables torque-sensor-free actuator model identification from trajectory data, achieving 1.88x better position tracking than a stand-trained baseline and 46% longer travel in downstream locomotion policies.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.02657","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Watch Your Step: Learning Semantically-Guided Locomotion in Cluttered Environment","primary_cat":"cs.RO","submitted_at":"2026-03-03T06:47:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SemLoco is a reinforcement learning system that integrates semantic understanding with foothold planning to let legged robots navigate cluttered environments without stepping on sensitive low-lying objects.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.04839","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"TAG-K: Tail-Averaged Greedy Kaczmarz for Computationally Efficient and Performant Online Inertial Parameter Estimation","primary_cat":"cs.RO","submitted_at":"2025-10-06T14:25:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"TAG-K combines greedy randomized Kaczmarz row selection with tail averaging to deliver faster convergence and noise robustness for online inertial parameter estimation in robotics.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2507.22345","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Reconfigured Wheel-Legged Robot for Enhanced Steering and Adaptability","primary_cat":"cs.RO","submitted_at":"2025-07-30T03:09:15+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"FLORES is a wheel-legged robot with front-leg hip-yaw DoFs replacing hip-roll, paired with a custom RL controller using adapted HIM and tailored rewards for smooth wheeled-to-legged transitions and efficient gaits.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2507.13662","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Iteratively Learning Muscle Memory for Legged Robots to Master Adaptive and High Precision Locomotion","primary_cat":"cs.RO","submitted_at":"2025-07-18T05:13:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Integrates iterative learning control with a torque library to enable high-precision adaptive locomotion on bipedal and quadrupedal robots, reducing tracking errors by up to 85% and achieving over 30x faster control rates.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}