OLIVE: Online Low-Rank Incremental Learning for Efficient Adaptive Exoskeletons

Ben Lengerich; Dong Liu; Tony Geng; Yanxuan Yu; Ying Nian Wu

arxiv: 2606.05234 · v1 · pith:WR7CY3KKnew · submitted 2026-06-03 · 💻 cs.RO · cs.LG

OLIVE: Online Low-Rank Incremental Learning for Efficient Adaptive Exoskeletons

Dong Liu , Yanxuan Yu , Ben Lengerich , Tony Geng , Ying Nian Wu This is my paper

Pith reviewed 2026-06-28 06:44 UTC · model grok-4.3

classification 💻 cs.RO cs.LG

keywords online adaptationlow-rank learningexoskeleton controlpolicy gradientwearable roboticssensor feedbackgait personalization

0 comments

The pith

A low-rank residual lets exoskeletons personalize their control online using only on-body sensor feedback.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents OLIVE as a way to adapt exoskeleton controllers to individual users and changing conditions while the device is already in use. It keeps a pretrained base controller fixed and adds only a low-rank update matrix whose parameters are adjusted by a policy gradient whose reward comes straight from EMG, IMU, and vibration sensors. A gating mechanism and a terrain-aware rank scheduler decide how much adaptation to apply at each moment, so the system stays efficient on easy ground and expands capacity on stairs or slopes. If the approach holds, wearable robots could move from fixed factory settings to controllers that keep improving with every step the wearer takes.

Core claim

OLIVE decomposes the adaptive component of the control policy into a low-rank residual form dW = At Bt^T with rank r much smaller than the matrix dimensions. This form is updated by a reward-shaped policy gradient that uses only on-body sensor signals, without any offline reference trajectories. A gating mechanism modulates adaptation strength according to context, and a dynamic rank scheduler increases update capacity on complex terrain. The result is reported as +13, +22, and +15 percentage-point gains in gait smoothness, effort reduction, and motion stability, with convergence inside roughly 1800 walking steps and 7.4 ms end-to-end latency.

What carries the argument

The low-rank residual update dW = At Bt^T together with the sensor-driven policy gradient, gating, and dynamic rank scheduler that together carry the online personalization while preserving the base controller.

If this is right

Personalization occurs continuously during actual deployment without any pre-collected reference motions.
Update cost falls from full-matrix O(dk) to O(r(d+k)), supporting real-time onboard computation.
The same controller maintains performance on flat ground, stairs, slopes, and uneven surfaces.
Convergence to improved behavior occurs within about 1800 walking steps at 7.4 ms latency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same low-rank residual pattern could be tested on prosthetic limb controllers where reference trajectories are also hard to obtain.
Because adaptation depends only on live sensor rewards, the method might handle environments that lack any expert demonstration data.
The terrain-dependent rank scheduler offers a concrete mechanism that other online robotic learning systems could adopt to trade compute for expressiveness.

Load-bearing premise

That sensor-derived rewards alone can drive stable improvements to the low-rank parameters without destabilizing the overall controller.

What would settle it

A controlled walking trial on mixed terrain in which the OLIVE-adapted controller shows no measurable gait improvement or produces instability after 2000 steps.

Figures

Figures reproduced from arXiv: 2606.05234 by Ben Lengerich, Dong Liu, Tony Geng, Yanxuan Yu, Ying Nian Wu.

**Figure 3.** Figure 3: Experimental results. (a) Performance compari [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

read the original abstract

Wearable exoskeleton systems hold promise for restoring mobility in individuals with physical impairments, yet most existing controllers rely on static gait policies that lack the ability to adapt to dynamic real-world environments or individual user characteristics. We present \olive (\underline{O}nline \underline{L}ow-rank \underline{I}ncremental Learning for Efficient Adapti\underline{ve} Exoskeletons), a parameter-efficient online adaptation framework that continuously personalizes exoskeleton control during deployment. \olive decomposes the adaptive component of the control policy into a low-rank residual form~$\dW = \At\Bt^\top$ with rank~$r!\ll!\min(d,k)$, reducing online update cost from $\mathcal{O}(dk)$ to $\mathcal{O}(r(d{+}k))$ while preserving the stability of a pretrained base controller~$\Wz$. Parameters are updated via a reward-shaped policy gradient driven purely by on-body sensor feedback (EMG, IMU, vibration), eliminating dependence on offline reference trajectories. A gating mechanism modulates the strength of personalization based on contextual state, and a dynamic rank scheduler adapts the update dimensionality to terrain complexity -- allocating minimal capacity on simple flat terrain and expanding to higher-rank updates on demanding uneven surfaces -- enabling robust performance across diverse activities: flat walking, stair navigation, slopes, and uneven terrain. Experiments on the wearable platform demonstrate that \olive achieves +13, +22, and +15 percentage-point improvements in gait smoothness, effort reduction, and motion stability over the strongest baseline, converging within $\sim$1{,}800 walking steps at 7.4,ms end-to-end latency. Our code implementation is available at https://github.com/FastLM/OLIVE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OLIVE combines low-rank residuals with sensor-driven policy gradients for exoskeleton adaptation in a way that targets real deployment constraints, but the stability of those updates and the experimental details both need more support.

read the letter

The main thing to know is that this paper puts together a low-rank residual update dW = At Bt^T on top of a fixed base controller, driven by policy gradients from EMG, IMU, and vibration sensors, plus gating and a dynamic rank scheduler that grows capacity on rough terrain.

The integration looks fresh for the exoskeleton setting. Keeping the base policy untouched while only adapting a low-rank piece cuts the update cost, and the terrain-adaptive rank plus context gating are practical touches that match how walking actually varies. The 7.4 ms latency and code release at the GitHub link are the kind of details that make the work easier to try out.

The experimental claims are the soft spot. The abstract gives +13, +22, and +15 point gains and convergence in roughly 1800 steps, but supplies no subject count, no description of the baselines beyond calling them strongest, and no mention of statistical checks. That leaves the numbers hard to weigh.

The stability issue raised in the stress-test note also stands out. The paper states that the low-rank form preserves the pretrained controller's stability, yet the description contains no Lyapunov argument, eigenvalue bound, or even a simple empirical check under added sensor noise. If the gradient updates push the residual outside the safe region on uneven ground, the total torque commands could become unsafe, and nothing shown rules that out.

This is for researchers working on adaptive wearable robotics who need parameter-efficient online methods. A reader in that area can extract usable architecture ideas even if the results require closer scrutiny. The work is coherent enough on its own terms to deserve a serious referee, mainly because the problem is relevant and the approach does not collapse to prior low-rank or online RL tricks.

I would send it to peer review with requests for experimental transparency and any stability analysis that exists in the full text.

Referee Report

2 major / 2 minor

Summary. The manuscript presents OLIVE, an online low-rank incremental learning framework for adaptive exoskeleton control. It decomposes policy adaptation as the low-rank residual dW = At Bt^T (r ≪ min(d,k)) added to a pretrained base controller W0, with parameters updated by a reward-shaped policy gradient from on-body EMG/IMU/vibration sensors. A gating mechanism and dynamic rank scheduler adjust personalization strength and rank based on context and terrain. Experiments claim +13, +22, and +15 percentage-point gains in gait smoothness, effort reduction, and motion stability over the strongest baseline, with convergence in ~1,800 steps at 7.4 ms latency; code is released at https://github.com/FastLM/OLIVE.

Significance. If the low-rank residual updates preserve closed-loop stability and the reported gains are supported by rigorous experiments, the approach would offer a computationally efficient path to real-time, sensor-driven personalization of exoskeletons across varied terrains without offline reference trajectories. The public code release is a clear strength for reproducibility.

major comments (2)

[Abstract and §3] Abstract and §3 (low-rank decomposition and update rule): the claim that dW = At Bt^T 'preserves the stability of a pretrained base controller W0' is load-bearing for safe deployment but is stated without any Lyapunov argument, eigenvalue bound, Lipschitz condition on the policy-gradient updates, or analysis of how sensor noise or reward variance affects the total policy W0 + dW.
[Experiments] Experiments section: the headline quantitative claims (+13, +22, +15 pp improvements, ~1,800-step convergence) are presented without reported subject count, statistical tests, precise baseline definitions, or controls against post-hoc analysis, preventing assessment of whether the data support the performance assertions.

minor comments (2)

[Abstract] Abstract contains rendering artifacts: 'r!\ll!\min(d,k)' and '7.4,ms' should be corrected to standard mathematical notation.
[Method] The dynamic rank scheduler is described at a high level; a short clarification of how terrain complexity is estimated from the available sensors would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on stability analysis and experimental reporting. We address each major comment below and will revise the manuscript to improve rigor where appropriate.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (low-rank decomposition and update rule): the claim that dW = At Bt^T 'preserves the stability of a pretrained base controller W0' is load-bearing for safe deployment but is stated without any Lyapunov argument, eigenvalue bound, Lipschitz condition on the policy-gradient updates, or analysis of how sensor noise or reward variance affects the total policy W0 + dW.

Authors: We agree that the stability claim would benefit from additional justification for safe deployment. The current manuscript supports it empirically via closed-loop experiments showing no divergence or instability across terrains. In the revision we will expand §3 with a discussion of bounded update norms (due to low rank r and gradient clipping), reference to incremental learning stability results in the literature, and an explicit statement that formal Lyapunov analysis is left for future work under additional assumptions on the reward. We will not claim a full proof in the revised text. revision: partial
Referee: [Experiments] Experiments section: the headline quantitative claims (+13, +22, +15 pp improvements, ~1,800-step convergence) are presented without reported subject count, statistical tests, precise baseline definitions, or controls against post-hoc analysis, preventing assessment of whether the data support the performance assertions.

Authors: We acknowledge the need for greater transparency in the experimental section. The revised manuscript will include subject count (n=8), statistical tests (repeated-measures ANOVA with post-hoc corrections), explicit baseline definitions matching the strongest comparator, and a statement on analysis pre-specification. These details were omitted for brevity in the original submission but are available from the study protocol. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical adaptation framework with independent experimental validation

full rationale

The paper presents OLIVE as an algorithmic framework that decomposes policy updates into low-rank residuals dW = At Bt^T, applies reward-shaped policy gradients from on-body sensors, and uses gating plus dynamic rank scheduling. These design choices are motivated by computational efficiency and online personalization rather than derived from the reported performance metrics. The +13/+22/+15 percentage-point gains and convergence claims are stated as outcomes of platform experiments, not quantities that reduce by construction to fitted parameters or self-citations. No self-definitional equations, fitted-input predictions, or load-bearing author self-citations appear in the derivation chain; the method remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Review limited to abstract; no explicit free parameters or invented entities detailed beyond the low-rank rank parameter and stability preservation assumption.

free parameters (1)

rank r
Dynamically scheduled based on terrain complexity but no specific values or selection procedure given in abstract.

axioms (1)

domain assumption Low-rank residual form preserves stability of pretrained base controller W0
Stated directly in the abstract as a property of the decomposition.

pith-pipeline@v0.9.1-grok · 5853 in / 1110 out tokens · 17725 ms · 2026-06-28T06:44:07.225544+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 4 canonical work pages · 1 internal anchor

[1]

Baojun Chen, Enhao Zheng, Xiaodan Fan, Tong Liang, Qining Wang, Kunlin Wei, and Long Wang. 2013. Locomotion mode classification using a wearable capacitive sensing system.IEEE transactions on neural systems and rehabilitation engineering21, 5 (2013), 744–755

2013
[2]

Jeffrey Dunn. 2010. Impact of mobility impairment on the burden of caregiving in individuals with multiple sclerosis.Expert review of pharmacoeconomics & outcomes research10, 4 (2010), 433–440

2010
[3]

Neville Hogan. 1985. Impedance control: An approach to manipulation: Part II—Implementation. (1985)

1985
[4]

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. 2022. Lora: Low-rank adaptation of large language models.Iclr1, 2 (2022), 3

2022
[5]

J. R. Koller, C. D. Remy, and D. P. Ferris. 2015. Learning to Walk with an Adap- tive Gain Proportional Myoelectric Controller for a Robotic Ankle Exoskeleton. Journal of NeuroEngineering and Rehabilitation12, 1 (2015), 97

2015
[6]

Ning Li, Wenyuan Chen, Yang Yang, Yihan Wang, Tie Yang, Peng Yu, Chuang Zhang, Wenxue Wang, Ning Xi, and Lianqing Liu. 2023. Model-agnostic person- alized knowledge adaptation for soft exoskeleton robot.IEEE Transactions on Medical Robotics and Bionics5, 2 (2023), 353–362

2023
[7]

Dong Liu and Yanxuan Yu. 2024. Llmeasyquant: Scalable quantization for parallel and distributed llm inference.arXiv preprint arXiv:2406.19657(2024)

work page arXiv 2024
[8]

Dong Liu and Yanxuan Yu. 2025. Mt2st: Adaptive multi-task to single-task learning. InProceedings of the 1st Workshop on Multimodal Augmented Generation via Multimodal Retrieval (MAGMaR 2025). 79–89

2025
[9]

Dong Liu, Yanxuan Yu, Ben Lengerich, and Ying Nian Wu. 2026. MKA: Memory-keyed attention for efficient long-context reasoning.arXiv preprint arXiv:2603.20586(2026)

work page arXiv 2026
[10]

Dong Liu, Yanxuan Yu, Yite Wang, Jing Wu, Zhongwei Wan, Sina Alinejad, Benjamin Lengerich, and Ying Nian Wu. 2024. Designing large foundation models for efficient training and inference: A survey.arXiv preprint arXiv:2409.01990 (2024)

work page arXiv 2024
[11]

Dong Liu, Yanxuan Yu, and Ying Nian Wu. 2025. EchoRL: Learning to Plan through Experience for Efficient Reinforcement Learning. InThe 5th Workshop on Mathematical Reasoning and AI at NeurIPS 2025

2025
[12]

Nathan W Moon, Paul MA Baker, and Kenneth Goughnour. 2019. Designing wearable technologies for users with disabilities: Accessibility, usability, and con- nectivity factors.Journal of Rehabilitation and Assistive Technologies Engineering 6 (2019), 2055668319862137

2019
[13]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov
[14]

Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

Benjamin A Shafer, Justine C Powell, Aaron J Young, and Gregory S Sawicki
[16]

Emulator-based optimization of a semi-active hip exoskeleton concept: Sweeping impedance across walking speeds.IEEE Transactions on Biomedical Engineering70, 1 (2022), 271–282

2022
[17]

Leia Stirling, Ho Chit Siu, Eric Jones, and Kevin Duda. 2018. Human factors considerations for enabling functional use of exosystems in operational environ- ments.IEEE Systems Journal13, 1 (2018), 1072–1083

2018
[18]

Frank Sup, Huseyin Atakan Varol, Jason Mitchell, Thomas J Withrow, and Michael Goldfarb. 2009. Preliminary evaluations of a self-contained anthro- pomorphic transfemoral prosthesis.IEEE/ASME Transactions on mechatronics14, 6 (2009), 667–676

2009
[19]

Michael R Tucker, Jeremy Olivier, Anna Pagel, Hannes Bleuler, Mohamed Bouri, Olivier Lambercy, Jose del R Millan, Robert Riener, Heike Vallery, and Roger Gassert. 2015. Control strategies for active lower extremity prosthetics and orthotics: a review.Journal of neuroengineering and rehabilitation12, 1 (2015), 1

2015
[20]

Jan F Veneman, Rik Kruidhof, Edsko EG Hekman, Ralf Ekkelenkamp, Edwin HF Van Asseldonk, and Herman Van Der Kooij. 2007. Design and evaluation of the LOPES exoskeleton robot for interactive gait rehabilitation.IEEE Transactions on neural systems and rehabilitation engineering15, 3 (2007), 379–386

2007
[21]

2011.World Report on Disability

World Health Organization. 2011.World Report on Disability. World Health Organization, Geneva, Switzerland

2011
[22]

Aaron J Young and Daniel P Ferris. 2016. State of the art and future directions for lower limb robotic exoskeletons.IEEE Transactions on Neural Systems and Rehabilitation Engineering25, 2 (2016), 171–182

2016
[23]

Juanjuan Zhang, Pieter Fiers, Kirby A Witte, Rachel W Jackson, Katherine L Poggensee, Christopher G Atkeson, and Steven H Collins. 2017. Human-in-the- loop optimization of exoskeleton assistance during walking.Science356, 6344 (2017), 1280–1284

2017

[1] [1]

Baojun Chen, Enhao Zheng, Xiaodan Fan, Tong Liang, Qining Wang, Kunlin Wei, and Long Wang. 2013. Locomotion mode classification using a wearable capacitive sensing system.IEEE transactions on neural systems and rehabilitation engineering21, 5 (2013), 744–755

2013

[2] [2]

Jeffrey Dunn. 2010. Impact of mobility impairment on the burden of caregiving in individuals with multiple sclerosis.Expert review of pharmacoeconomics & outcomes research10, 4 (2010), 433–440

2010

[3] [3]

Neville Hogan. 1985. Impedance control: An approach to manipulation: Part II—Implementation. (1985)

1985

[4] [4]

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. 2022. Lora: Low-rank adaptation of large language models.Iclr1, 2 (2022), 3

2022

[5] [5]

J. R. Koller, C. D. Remy, and D. P. Ferris. 2015. Learning to Walk with an Adap- tive Gain Proportional Myoelectric Controller for a Robotic Ankle Exoskeleton. Journal of NeuroEngineering and Rehabilitation12, 1 (2015), 97

2015

[6] [6]

Ning Li, Wenyuan Chen, Yang Yang, Yihan Wang, Tie Yang, Peng Yu, Chuang Zhang, Wenxue Wang, Ning Xi, and Lianqing Liu. 2023. Model-agnostic person- alized knowledge adaptation for soft exoskeleton robot.IEEE Transactions on Medical Robotics and Bionics5, 2 (2023), 353–362

2023

[7] [7]

Dong Liu and Yanxuan Yu. 2024. Llmeasyquant: Scalable quantization for parallel and distributed llm inference.arXiv preprint arXiv:2406.19657(2024)

work page arXiv 2024

[8] [8]

Dong Liu and Yanxuan Yu. 2025. Mt2st: Adaptive multi-task to single-task learning. InProceedings of the 1st Workshop on Multimodal Augmented Generation via Multimodal Retrieval (MAGMaR 2025). 79–89

2025

[9] [9]

Dong Liu, Yanxuan Yu, Ben Lengerich, and Ying Nian Wu. 2026. MKA: Memory-keyed attention for efficient long-context reasoning.arXiv preprint arXiv:2603.20586(2026)

work page arXiv 2026

[10] [10]

Dong Liu, Yanxuan Yu, Yite Wang, Jing Wu, Zhongwei Wan, Sina Alinejad, Benjamin Lengerich, and Ying Nian Wu. 2024. Designing large foundation models for efficient training and inference: A survey.arXiv preprint arXiv:2409.01990 (2024)

work page arXiv 2024

[11] [11]

Dong Liu, Yanxuan Yu, and Ying Nian Wu. 2025. EchoRL: Learning to Plan through Experience for Efficient Reinforcement Learning. InThe 5th Workshop on Mathematical Reasoning and AI at NeurIPS 2025

2025

[12] [12]

Nathan W Moon, Paul MA Baker, and Kenneth Goughnour. 2019. Designing wearable technologies for users with disabilities: Accessibility, usability, and con- nectivity factors.Journal of Rehabilitation and Assistive Technologies Engineering 6 (2019), 2055668319862137

2019

[13] [13]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov

[14] [14]

Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[15] [15]

Benjamin A Shafer, Justine C Powell, Aaron J Young, and Gregory S Sawicki

[16] [16]

Emulator-based optimization of a semi-active hip exoskeleton concept: Sweeping impedance across walking speeds.IEEE Transactions on Biomedical Engineering70, 1 (2022), 271–282

2022

[17] [17]

Leia Stirling, Ho Chit Siu, Eric Jones, and Kevin Duda. 2018. Human factors considerations for enabling functional use of exosystems in operational environ- ments.IEEE Systems Journal13, 1 (2018), 1072–1083

2018

[18] [18]

Frank Sup, Huseyin Atakan Varol, Jason Mitchell, Thomas J Withrow, and Michael Goldfarb. 2009. Preliminary evaluations of a self-contained anthro- pomorphic transfemoral prosthesis.IEEE/ASME Transactions on mechatronics14, 6 (2009), 667–676

2009

[19] [19]

Michael R Tucker, Jeremy Olivier, Anna Pagel, Hannes Bleuler, Mohamed Bouri, Olivier Lambercy, Jose del R Millan, Robert Riener, Heike Vallery, and Roger Gassert. 2015. Control strategies for active lower extremity prosthetics and orthotics: a review.Journal of neuroengineering and rehabilitation12, 1 (2015), 1

2015

[20] [20]

Jan F Veneman, Rik Kruidhof, Edsko EG Hekman, Ralf Ekkelenkamp, Edwin HF Van Asseldonk, and Herman Van Der Kooij. 2007. Design and evaluation of the LOPES exoskeleton robot for interactive gait rehabilitation.IEEE Transactions on neural systems and rehabilitation engineering15, 3 (2007), 379–386

2007

[21] [21]

2011.World Report on Disability

World Health Organization. 2011.World Report on Disability. World Health Organization, Geneva, Switzerland

2011

[22] [22]

Aaron J Young and Daniel P Ferris. 2016. State of the art and future directions for lower limb robotic exoskeletons.IEEE Transactions on Neural Systems and Rehabilitation Engineering25, 2 (2016), 171–182

2016

[23] [23]

Juanjuan Zhang, Pieter Fiers, Kirby A Witte, Rachel W Jackson, Katherine L Poggensee, Christopher G Atkeson, and Steven H Collins. 2017. Human-in-the- loop optimization of exoskeleton assistance during walking.Science356, 6344 (2017), 1280–1284

2017