pith. sign in

arxiv: 2605.20223 · v1 · pith:B4O3VRGPnew · submitted 2026-05-13 · 💻 cs.CV

Why Latent Actions Fail, and How to Prevent It

Pith reviewed 2026-05-21 08:29 UTC · model grok-4.3

classification 💻 cs.CV
keywords latent action modelsexogenous statevideo representation learningreconstruction objectiveaction consistencyendogenous components
0
0 comments X

The pith

Minimizing reconstruction in latent action models makes them encode future exogenous information instead of actions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Latent action models aim to learn action-like representations by compressing changes between video frames without labels. Real videos include exogenous changes such as background clutter that are unrelated to the agent's actions. By extending a linear framework to model this exogenous state explicitly, the paper shows that the standard reconstruction objective causes latent actions to pick up future exogenous details. Focusing training on endogenous components or adding auxiliary objectives like action supervision prevents this by encouraging consistency across exogenous variations. Experiments confirm the analysis holds for both linear and nonlinear models.

Core claim

Extending the linear LAM framework to explicitly model exogenous state shows that minimizing the standard reconstruction objective produces latent actions that encode exogenous information from future observations, while a representation space focused on endogenous components mitigates noise interference and auxiliary objectives such as action supervision provably encourage consistency across exogenous states.

What carries the argument

The extended linear LAM framework with explicit exogenous state modeling, which derives how reconstruction objectives cause encoding of future exogenous information.

If this is right

  • Latent actions encode exogenous information from future observations under standard reconstruction training.
  • Auxiliary objectives such as action supervision encourage latent actions to be consistent across different exogenous states.
  • Learning in a representation space that focuses on endogenous components reduces interference from exogenous noise.
  • The same mechanisms hold in experiments on both linear and nonlinear latent action models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The framework suggests designing new objectives that explicitly separate endogenous and exogenous dynamics in broader video self-supervised learning.
  • Controlled synthetic datasets with isolated exogenous factors could directly measure the degree of future information leakage in trained models.
  • Robotics applications might improve action learning by adding explicit noise accounting steps during pretraining on real-world videos.

Load-bearing premise

The extension of the linear LAM framework to explicitly model exogenous state provides an accurate analytical lens whose insights generalize to nonlinear LAMs and real video data.

What would settle it

Train latent action models with standard reconstruction on videos that have controlled future exogenous changes and check whether the resulting latent actions correlate with those future background variations rather than with agent actions.

Figures

Figures reproduced from arXiv: 2605.20223 by Jung Min Lee, Jungwoo Lee, Li Zhao, Taehyun Cho.

Figure 1
Figure 1. Figure 1: Overview of Linear LAM. (Left): Observation transitions contain a state-driven component qξ = ϕξ(a) and a ξ-driven exogenous component ε. (Right): Architecture of linear LAM with a pretrained vision encoder f. Using a vision encoder f that is robust to ξ-variation, latent actions z˜ from other exogenous state ˜ξ, and an exogenous-robust target y improve latent action learning. generating H-step trajectorie… view at source ↗
Figure 2
Figure 2. Figure 2: Future exogenous state leaks into latent actions. (a) Predicted next observation under the Recon. and ξ ′ -swap. (b) PSNR of three settings. ξ ′ -swap drops sharply when targeting o ′ but matches Aug. Recon. when targeting o˜ ′ . (c) Normalized variance Varξ ′ (z|s, ξ, a)/∥z∥ 2 and action NMSE as functions of pswitch. Both metrics rise together as pswitch grows, indicating that z encodes ξ ′ at the cost of… view at source ↗
Figure 3
Figure 3. Figure 3: Exogenous state sensitivity of vision features impacts action alignments. (Left) Attention maps on Bridge V2 and RT-1 for LAMs trained with DINOv2 (top, UniVLA [3]) versus raw-image observations (bottom, LAPA [2]). DINOv2 attends to manipulation-relevant regions (green boxes), while raw-image LAMs often attend to background or manipulation-irrelevant factors (red boxes). (Right) In the linear LAM, action N… view at source ↗
Figure 4
Figure 4. Figure 4: Effect of LX-exo and Lξ-robust. (a) ∆ action-validation NMSE relative to LLAM (negative values indicate improvement) for three auxiliary training objectives: (left) X-exo, (mid) Action-pred, and (right) q-pred. (b) Normalized variance of latent action within context (lower is better), measuring how consistent the latent actions are. on DINOv2 focuses on manipulation-relevant parts. By contrast, LAM trained… view at source ↗
Figure 5
Figure 5. Figure 5: Practical LAM. (a) Architecture of practical LAM. ψIDM is implemented with CNN and MLP layers with vector quantization (VQ), while ψFDM uses a UNet architecture with CNN layers. (b) Result of practical LAM. (Left): Consistency loss; lower is better. (Right): Exogenous region MSE; higher is better. Shaded regions show standard deviation across 3 random seeds. We compare a LAM trained with LLAM against LAMs … view at source ↗
Figure 6
Figure 6. Figure 6: Architecture of practical LAM on DCS. (a) We first collect clean demonstrations from a PPO agent and inject exogenous noise into each observation. Both ψIDM and ψFDM are implemented with several convolution layers. (b) The latent action policy π(z|o) predicts z from o using convolution layers, and πDec(a|z) decodes the prediction of the latent action policy into a ground-truth action. Results. Results are … view at source ↗
read the original abstract

Latent action models (LAMs) aim to learn action-like representations from unlabeled videos by compressing frame-to-frame changes. The frames of in-the-wild videos, however, contain not only the agent's own state but exogenous state such as background clutter. Since the exogenous state introduces changes unrelated to actions, it hinders reliable latent action learning. This paper investigates this problem analytically by extending a linear LAM framework to explicitly model exogenous state. Our analysis reveals two insights: (1) minimizing the standard reconstruction objective produces latent actions that encode exogenous information from future observation; and (2) learning in a representation space that focuses on endogenous components is a key to mitigating the interference of noise. We further show that previously proposed auxiliary objectives, such as action-supervision, provably encourage latent actions to be consistent across exogenous states. These findings are validated through experiments on both linear and nonlinear LAMs, providing a unified theoretical analysis of how exogenous state hinders latent action learning and why common remedies work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that latent action models (LAMs) trained with standard reconstruction objectives on unlabeled videos encode exogenous information (e.g., background clutter) from future observations into the latent actions. By extending a linear LAM framework to explicitly model exogenous state, the authors derive that the reconstruction minimizer injects future exogenous components via cross-covariance terms under independent endogenous/exogenous evolution. They further show that representation spaces focused on endogenous components mitigate interference and that auxiliary objectives (e.g., action-supervision) provably encourage consistency across exogenous states. These insights are validated on both linear and nonlinear LAMs, offering a unified explanation for failures and remedies.

Significance. If the central claims hold, the work supplies a useful analytical lens on exogenous-state interference in video-based action representation learning, unifying why reconstruction fails and why certain auxiliary losses succeed. The closed-form linear derivation paired with nonlinear empirical validation is a clear strength; the paper ships an explicit analytical derivation rather than purely empirical fitting, which strengthens its contribution to the field.

major comments (2)
  1. [nonlinear validation / experiments] The central derivation (linear exogenous model) shows that reconstruction injects future exogenous components via cross-covariance under the closed-form solution and independence assumption. However, the nonlinear validation section does not demonstrate that the same mechanism persists once a learned nonlinear mapping replaces the linear algebra structure; the mapping could suppress or amplify the encoding, so the analytical explanation does not necessarily transfer to the nonlinear and real-video regimes that the unified-analysis claim rests on.
  2. [theory / linear analysis] The weakest assumption—that the linear exogenous extension provides an accurate analytical lens whose insights generalize—is load-bearing for the paper’s main contribution. The manuscript should include either a concrete test (e.g., controlled violation of independence) or an explicit discussion of when the linear insight is expected to break, because the skeptic concern about nonlinear interactions altering the encoding is not yet addressed.
minor comments (2)
  1. [notation / preliminaries] Notation for endogenous versus exogenous states should be introduced once and used consistently; occasional reuse of symbols across sections reduces readability.
  2. [experiments] The experimental section would benefit from a short table summarizing the exact controls (e.g., exogenous noise levels, independence violations) used in the linear and nonlinear validations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive evaluation of the paper's analytical contribution. We address the two major comments point-by-point below, focusing on strengthening the connection between the linear derivation and nonlinear regimes as well as clarifying the scope of the linear assumptions.

read point-by-point responses
  1. Referee: [nonlinear validation / experiments] The central derivation (linear exogenous model) shows that reconstruction injects future exogenous components via cross-covariance under the closed-form solution and independence assumption. However, the nonlinear validation section does not demonstrate that the same mechanism persists once a learned nonlinear mapping replaces the linear algebra structure; the mapping could suppress or amplify the encoding, so the analytical explanation does not necessarily transfer to the nonlinear and real-video regimes that the unified-analysis claim rests on.

    Authors: We agree that the current nonlinear experiments primarily validate the overall empirical behavior rather than isolating the cross-covariance injection mechanism. In the revised manuscript we will add a targeted analysis to the nonlinear section: we will measure and report the correlation between the learned latent actions and future exogenous state components (background changes) across noise levels, mirroring the linear closed-form prediction. This will provide direct empirical support that the encoding mechanism persists under learned nonlinear mappings. revision: yes

  2. Referee: [theory / linear analysis] The weakest assumption—that the linear exogenous extension provides an accurate analytical lens whose insights generalize—is load-bearing for the paper’s main contribution. The manuscript should include either a concrete test (e.g., controlled violation of independence) or an explicit discussion of when the linear insight is expected to break, because the skeptic concern about nonlinear interactions altering the encoding is not yet addressed.

    Authors: We accept that an explicit discussion of the linear assumption's scope is warranted. We will insert a new subsection in the Discussion that states the conditions under which the linear insights are expected to hold (local linearity of the mappings, approximate independence of endogenous/exogenous processes) and when they may break (strong nonlinear cross-interactions). We will also add a controlled synthetic experiment that deliberately violates the independence assumption and reports the resulting change in latent-action encoding, thereby providing a concrete test of robustness. revision: yes

Circularity Check

0 steps flagged

Analytical derivation from extended linear model is self-contained with no reduction to inputs by construction

full rationale

The paper extends a prior linear LAM framework by explicitly introducing an exogenous state variable and then derives the form of the reconstruction minimizer in closed form under linear dynamics and independence assumptions. This produces the stated result that latent actions encode future exogenous components via cross-covariance terms. The derivation is a direct algebraic consequence of the model equations rather than a fit to target data or a renaming of an input. No self-citation is load-bearing for the central claim; the auxiliary-objective proofs are likewise algebraic consequences of the same linear setup. Empirical checks on nonlinear models are presented separately and do not retroactively alter the linear analysis. The derivation chain therefore remains independent of the conclusions it reaches.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the modeling choice to extend the linear LAM framework to separate endogenous and exogenous state; this is treated as a domain assumption that enables the subsequent derivations.

axioms (1)
  • domain assumption The linear LAM framework can be extended to explicitly model exogenous state in a way that captures interference in frame-to-frame changes.
    This modeling step is invoked to perform the analytical investigation described in the abstract.

pith-pipeline@v0.9.0 · 5699 in / 1209 out tokens · 42524 ms · 2026-05-21T08:29:12.840292+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 4 internal anchors

  1. [1]

    Robert McCarthy, Daniel C. H. Tan, Dominik Schmidt, Fernando Acero, Nathan Herr, Yilun Du, Thomas G. Thuruthel, and Zhibin Li. Towards generalist robot learning from internet video: A survey, 2024. URLhttps://arxiv.org/abs/2404.19664

  2. [2]

    Latent action pretraining from videos

    Seonghyeon Ye, Joel Jang, Byeongguk Jeon, Se June Joo, Jianwei Yang, Baolin Peng, Ajay Mandlekar, Reuben Tan, Yu-Wei Chao, Bill Yuchen Lin, Lars Liden, Kimin Lee, Jianfeng Gao, Luke Zettlemoyer, Dieter Fox, and Minjoon Seo. Latent action pretraining from videos. In The Thirteenth International Conference on Learning Representations, 2025. URL https: //ope...

  3. [3]

    UniVLA: Learning to Act Anywhere with Task-centric Latent Actions

    Qingwen Bu, Yanting Yang, Jisong Cai, Shenyuan Gao, Guanghui Ren, Maoqing Yao, Ping Luo, and Hongyang Li. Univla: Learning to act anywhere with task-centric latent actions.arXiv preprint arXiv:2505.06111, 2025

  4. [4]

    villa- x: Enhancing latent action modeling in vision-language-action models

    Xiaoyu Chen, Hangxing Wei, Pushi Zhang, Chuheng Zhang, Kaixin Wang, Yanjiang Guo, Rushuai Yang, Yucen Wang, Xinquan Xiao, Li Zhao, Jianyu Chen, and Jiang Bian. villa- x: Enhancing latent action modeling in vision-language-action models. InThe Fourteenth International Conference on Learning Representations, 2026. URL https://openreview. net/forum?id=y5CaJb17Fn

  5. [5]

    MVP-LAM: Learning Action-Centric Latent Action via Cross-Viewpoint Reconstruction

    Jung Min Lee, Dohyeok Lee, Seokhun Ju, Taehyun Cho, Jin Woo Koo, Li Zhao, Sangwoo Hong, and Jungwoo Lee. Mvp-lam: Learning action-centric latent action via cross-viewpoint reconstruction, 2026. URLhttps://arxiv.org/abs/2602.03668

  6. [6]

    Genie: Generative interactive environments, 2024

    Jake Bruce, Michael Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Bechtle, Feryal Behbahani, Stephanie Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nando de Freitas, Satinder Si...

  7. [7]

    Adaworld: Learning adaptable world models with latent actions

    Shenyuan Gao, Siyuan Zhou, Yilun Du, Jun Zhang, and Chuang Gan. Adaworld: Learning adaptable world models with latent actions. InInternational Conference on Machine Learning (ICML), 2025

  8. [8]

    What do latent action models actually learn? InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

    Chuheng Zhang, Tim Pearce, Pushi Zhang, Kaixin Wang, Xiaoyu Chen, Wei Shen, Li Zhao, and Jiang Bian. What do latent action models actually learn? InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URL https://openreview. net/forum?id=DQMjemrVhe

  9. [9]

    Towards principled representation learning from videos for reinforcement learning

    Dipendra Misra, Akanksha Saran, Tengyang Xie, Alex Lamb, and John Langford. Towards principled representation learning from videos for reinforcement learning. InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview. net/forum?id=3mnWvUZIXt

  10. [10]

    Latent action learning requires supervision in the presence of distractors

    Alexander Nikulin, Ilya Zisman, Denis Tarasov, Lyubaykin Nikita, Andrei Polubarov, Igor Kiselev, and Vladislav Kurenkov. Latent action learning requires supervision in the presence of distractors. InForty-second International Conference on Machine Learning, 2025. URL https://openreview.net/forum?id=2gcEQCT7QW

  11. [11]

    Laof: Robust latent action learning with optical flow constraints, 2025

    Xizhou Bu, Jiexi Lyu, Fulei Sun, Ruichen Yang, Zhiqiang Ma, and Wei Li. Laof: Robust latent action learning with optical flow constraints, 2025. URL https://arxiv.org/abs/2511. 16407

  12. [12]

    Neural discrete representation learning

    Aaron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. Neural discrete representation learning. InProceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 6309–6318, Red Hook, NY , USA, 2017. Curran Associates Inc. ISBN 9781510860964. 10

  13. [13]

    Igor: Image-goal representations are the atomic control units for foundation models in embodied ai

    Xiaoyu Chen, Junliang Guo, Tianyu He, Chuheng Zhang, Pushi Zhang, Derek Cathera Yang, Li Zhao, and Jiang Bian. Igor: Image-goal representations are the atomic control units for foundation model in embodied ai.arXiv preprint arXiv:2411.00785, 2024. URL https: //arxiv.org/abs/2411.00785

  14. [14]

    Provably filtering exogenous distractors using multistep inverse dynamics

    Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, and John Langford. Provably filtering exogenous distractors using multistep inverse dynamics. InInternational Conference on Learning Representations, 2022. URL https://openreview.net/forum? id=RQLLzMCefQu

  15. [15]

    Guaranteed discovery of control-endogenous latent states with multi-step inverse models, 2022

    Alex Lamb, Riashat Islam, Yonathan Efroni, Aniket Didolkar, Dipendra Misra, Dylan Foster, Lekan Molu, Rajan Chari, Akshay Krishnamurthy, and John Langford. Guaranteed discovery of control-endogenous latent states with multi-step inverse models, 2022. URL https://arxiv. org/abs/2207.08229

  16. [16]

    Understanding intermediate layers using linear classifier probes, 2017

    Guillaume Alain and Yoshua Bengio. Understanding intermediate layers using linear classifier probes, 2017. URLhttps://openreview.net/forum?id=ryF7rTqgl

  17. [17]

    Vla-jepa: Enhancing vision-language-action model with latent world model, 2026

    Jingwen Sun, Wenyao Zhang, Zekun Qi, Shaojie Ren, Zezhi Liu, Hanxin Zhu, Guangzhong Sun, Xin Jin, and Zhibo Chen. Vla-jepa: Enhancing vision-language-action model with latent world model, 2026. URLhttps://arxiv.org/abs/2602.10098

  18. [18]

    Como: Learning continuous latent motion from internet videos for scalable robot learning, 2025

    Jiange Yang, Yansong Shi, Haoyi Zhu, Mingyu Liu, Kaijing Ma, Yating Wang, Gangshan Wu, Tong He, and Limin Wang. Como: Learning continuous latent motion from internet videos for scalable robot learning, 2025. URLhttps://openreview.net/forum?id=Cu9NOcqfzN

  19. [19]

    Learning latent action world models in the wild.arXiv preprint arXiv:2601.05230, 2026

    Quentin Garrido, Tushar Nagarajan, Basile Terver, Nicolas Ballas, Yann LeCun, and Michael Rabbat. Learning latent action world models in the wild.arXiv preprint arXiv:2601.05230, 2026

  20. [20]

    Bridgedata v2: A dataset for robot learning at scale

    Homer Walke, Kevin Black, Abraham Lee, Moo Jin Kim, Max Du, Chongyi Zheng, Tony Zhao, Philippe Hansen-Estruch, Quan Vuong, Andre He, Vivek Myers, Kuan Fang, Chelsea Finn, and Sergey Levine. Bridgedata v2: A dataset for robot learning at scale. InConference on Robot Learning (CoRL), 2023

  21. [21]

    Oxe-auge: A large-scale robot augmentation of oxe for scaling cross-embodiment policy learning.arXiv preprint arXiv:2512.13100, 2025

    Guanhua Ji, Harsha Polavaram, Lawrence Yunliang Chen, Sandeep Bajamahal, Zehan Ma, Simeon Adebola, Chenfeng Xu, and Ken Goldberg. Oxe-auge: A large-scale robot augmentation of oxe for scaling cross-embodiment policy learning.arXiv preprint arXiv:2512.13100, 2025

  22. [22]

    Open X-Embodiment Collaboration, Abby O’Neill, Abdul Rehman, Abhinav Gupta, Abhi- ram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, ...

  23. [23]

    Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patrick L...

  24. [24]

    Self- supervised visual state representation learning for robotics from dynamic scenes

    Taekyung Kim, Jeongeun Park, Sangdoo Yun, Dongyoon Han, and Byeongho Heo. Self- supervised visual state representation learning for robotics from dynamic scenes. In7th Robot Learning Workshop: Towards Robots with Human-Level Abilities, 2025. URL https: //openreview.net/forum?id=bEM2WGagcJ

  25. [25]

    StaMo: Unsupervised Learning of Generalizable Robot Motion from Compact State Representation

    Mingyu Liu, Jiuhe Shu, Hui Chen, Zeju Li, Canyu Zhao, Jiange Yang, Shenyuan Gao, Hao Chen, and Chunhua Shen. Stamo: Unsupervised learning of generalizable robot motion from compact state representation, 2025. URLhttps://arxiv.org/abs/2510.05057

  26. [26]

    Masked autoencoders are scalable vision learners

    Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16000–16009, June 2022

  27. [27]

    Freeman, Frédo Durand, Eli Shechtman, and Xun Huang

    Henrique Morimitsu, Xiaobin Zhu, Roberto M. Cesar, Xiangyang Ji, and Xu-Cheng Yin. Dpflow: Adaptive optical flow estimation with a dual-pyramid framework. In2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17810–17820, 2025. doi: 10.1109/CVPR52734.2025.01659

  28. [28]

    Learning to act without actions

    Dominik Schmidt and Minqi Jiang. Learning to act without actions. InThe Twelfth International Conference on Learning Representations (ICLR), 2024

  29. [29]

    The distracting con- trol suite–a challenging benchmark for reinforcement learning from pixels.arXiv preprint arXiv:2101.02722, 2021

    Austin Stone, Oscar Ramirez, Kurt Konolige, and Rico Jonschkowski. The distracting con- trol suite – a challenging benchmark for reinforcement learning from pixels.arXiv preprint arXiv:2101.02722, 2021. 12

  30. [30]

    dm_control: Software and tasks for continuous control.Software Impacts, 6:100022, 2020

    Saran Tunyasuvunakool, Alistair Muldal, Yotam Doron, Siqi Liu, Steven Bohez, Josh Merel, Tom Erez, Timothy Lillicrap, Nicolas Heess, and Yuval Tassa. dm_control: Software and tasks for continuous control.Software Impacts, 6:100022, 2020. ISSN 2665-9638. URL https://www.sciencedirect.com/science/article/pii/S2665963820300099

  31. [31]

    Dynamo: In-domain dynamics pretraining for visuo-motor control, 2024

    Zichen Jeff Cui, Hengkai Pan, Aadhithya Iyer, Siddhant Haldar, and Lerrel Pinto. Dynamo: In-domain dynamics pretraining for visuo-motor control, 2024. URL https://arxiv.org/ abs/2409.12192. 13 A Proof A.1 Proof of Proposition 4.2 Proof. Suppose θ⋆ = (A⋆, B⋆, C⋆, D⋆) is a global minimizer of LLAM and that z⋆ =C ⋆o+D ⋆o′ is ξ′-independent. Let Π denote the ...