SeqLoRA: Bilevel Orthogonal Adaptation for Continual Multi-Concept Generation

Amir Joudaki; Andr\'e M. H. Teixeira; Enis Simsar; Javad Parsa; Thomas Hofmann

arxiv: 2605.22743 · v1 · pith:Z4RCY3EPnew · submitted 2026-05-21 · 💻 cs.LG

SeqLoRA: Bilevel Orthogonal Adaptation for Continual Multi-Concept Generation

Javad Parsa , Enis Simsar , Amir Joudaki , Thomas Hofmann , Andr\'e M. H. Teixeira This is my paper

Pith reviewed 2026-05-22 07:01 UTC · model grok-4.3

classification 💻 cs.LG

keywords LoRAcontinual learningtext-to-image diffusionbilevel optimizationcatastrophic forgettingmulti-concept generationparameter-efficient fine-tuning

0 comments

The pith

SeqLoRA jointly optimizes both LoRA factors via bilevel optimization to compose multiple custom concepts in text-to-image models while bounding catastrophic forgetting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SeqLoRA as a method for adding new visual concepts to diffusion models one after another without the new ones erasing or mixing with the old ones. It frames the problem as a constrained continual-learning task and solves it by bilevel optimization that tunes the low-rank adaptation matrices together rather than freezing one part. The authors derive convergence guarantees for the optimizer and model residual activations as a matrix sub-Gaussian process to obtain high-probability bounds on how much prior concepts degrade. They also prove that learning the adaptation directions directly from data reduces residual interference energy more than methods that keep the basis fixed. Experiments show the approach scales to 101 concepts while preserving identity and avoiding expensive post-hoc fusion steps.

Core claim

SeqLoRA is a constrained continual learning framework that jointly optimizes both LoRA factors via bilevel optimization, establishes strong convergence guarantees, models residual layer activations as a matrix sub-Gaussian process to derive high-probability bounds on catastrophic forgetting, and proves that learning the LoRA basis from data minimizes residual interference energy more effectively than frozen-basis methods.

What carries the argument

Bilevel optimization that jointly tunes the two LoRA factors under sequential regularization to keep adaptation subspaces from interfering.

If this is right

Multi-concept image generation becomes feasible up to at least 101 concepts without post-hoc fusion or loss of identity.
Attribute interference between composed concepts is reduced because the learned basis minimizes residual energy overlap.
The bilevel procedure converges reliably, allowing stable sequential addition of concepts over long training sequences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same bilevel structure could be tested on other parameter-efficient adapters such as adapters or prefix tuning to see whether the interference bounds generalize.
If the sub-Gaussian modeling holds across architectures, it supplies a practical way to predict how many new concepts can be added before forgetting exceeds a chosen threshold.
The proof that data-driven bases outperform frozen ones suggests trying the approach on non-image modalities where concept composition also suffers from crosstalk.

Load-bearing premise

Residual layer activations can be accurately modeled as a matrix sub-Gaussian process.

What would settle it

A direct measurement on real diffusion layers showing that the derived high-probability forgetting bounds fail to hold when the activation statistics deviate from the sub-Gaussian assumption.

Figures

Figures reproduced from arXiv: 2605.22743 by Amir Joudaki, Andr\'e M. H. Teixeira, Enis Simsar, Javad Parsa, Thomas Hofmann.

**Figure 2.** Figure 2: Evolution of generated images for selected concepts across different training steps for [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: SeqLoRA’s scalability and stability. References Hedy Attouch, Jer´ ome Bolte, and Benar Svaiter. Convergence of descent methods for semi-algebraic ˆ and tame problems: Proximal algorithms, forward-backward splitting, and regularized gauss-seidel methods. Mathematical Programming, 137, 01 2011. doi: 10.1007/s10107-011-0484-9. Omri Avrahami, Kfir Aberman, Ohad Fried, Daniel Cohen-Or, and Dani Lischinski. Bre… view at source ↗

**Figure 7.** Figure 7: Multi-concept regional generation: qualitative comparison. Each subfigure shows, in the top-left cell, the input concept reference images, followed by five generated outputs (one per method) arranged in a 3×2 grid. All methods use the same random seed and the same regional sketch/keypose conditioning, so visual differences reflect the underlying multi-concept fusion mechanism rather than randomness or spat… view at source ↗

**Figure 4.** Figure 4: Supplementary qualitative comparison for concepts 1-12 (ordered by training sequence). [PITH_FULL_IMAGE:figures/full_fig_p033_4.png] view at source ↗

**Figure 5.** Figure 5: Supplementary qualitative comparison for concepts 13-24 (ordered by training sequence). [PITH_FULL_IMAGE:figures/full_fig_p034_5.png] view at source ↗

**Figure 6.** Figure 6: Supplementary qualitative comparison for concepts 25-32 (ordered by training sequence). [PITH_FULL_IMAGE:figures/full_fig_p035_6.png] view at source ↗

**Figure 8.** Figure 8: Evolution of generated images for concepts 1–16 across different training steps for [PITH_FULL_IMAGE:figures/full_fig_p036_8.png] view at source ↗

**Figure 9.** Figure 9: Evolution of generated images for concepts 17–32 across different training steps for [PITH_FULL_IMAGE:figures/full_fig_p037_9.png] view at source ↗

read the original abstract

Parameter-efficient fine-tuning enables fast personalization of text-to-image diffusion models, but composing multiple custom concepts remains challenging due to representation interference. Existing modular methods either rely on expensive post-hoc fusion or freeze adaptation subspaces, which limit expressiveness and concept fidelity. To address this trade-off, we propose Sequential regularized LoRA (SeqLoRA), a constrained continual learning framework that jointly optimizes both LoRA factors via bilevel optimization. Theoretically, we establish strong convergence guarantees for our algorithm and model the residual layer activations as a matrix sub-Gaussian process to derive high-probability bounds on catastrophic forgetting. We further prove that learning the LoRA basis from data minimizes residual interference energy more effectively than frozen-basis methods. Experiments on multi-concept image generation demonstrate that SeqLoRA improves identity preservation and scalability across up to 101 concepts, while avoiding costly fusion and reducing attribute interference in composed generations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SeqLoRA combines bilevel optimization on both LoRA factors with sub-Gaussian residual modeling for continual diffusion personalization, but the forgetting bounds rest on an unverified distributional assumption.

read the letter

The main takeaway is that this paper pushes a bilevel setup to tune both the low-rank update matrices sequentially for adding concepts to diffusion models without heavy fusion or frozen subspaces. It also tries to back the approach with convergence results and probabilistic bounds on forgetting derived from treating residual activations as a matrix sub-Gaussian process, plus a proof that data-driven bases cut interference energy better than fixed ones. Experiments reportedly scale to 101 concepts with gains in identity preservation and less attribute mixing. That combination of joint optimization and the specific modeling choice for the theory is what looks new relative to the modular and frozen-basis baselines mentioned in the abstract. The practical framing around long user sessions for personalization is a clear strength, and the scale of the multi-concept tests gives some concrete evidence that the method holds up in generation quality. The soft spot sits in the theory. The high-probability forgetting bounds come directly from the sub-Gaussian process assumption on residuals after LoRA insertion. Diffusion denoising trajectories often show heavier tails than sub-Gaussian conditions would allow, and nothing in the abstract indicates they checked moments or Orlicz norms on real residuals. If that modeling choice does not hold tightly, the bounds become loose and the claim that learned bases minimize interference more effectively loses force. The convergence guarantees are stated but would need the full derivation to judge how they interact with the bilevel constraint. This work is for researchers focused on efficient continual adaptation in generative models rather than core diffusion architecture. Readers who care about practical composition of user concepts over many sessions would find the method and results useful even if they treat the bounds as preliminary. The paper shows enough structure and experimental reach to deserve referee time, though the theory section would likely need tightening on the distributional justification. I would send it for review with a request to verify the sub-Gaussian fit on the actual residuals.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces SeqLoRA, a sequential regularized LoRA framework for continual multi-concept generation in text-to-image diffusion models. It jointly optimizes LoRA factors via bilevel optimization, claims strong convergence guarantees, models residual layer activations as a matrix sub-Gaussian process to derive high-probability bounds on catastrophic forgetting, and proves that data-driven LoRA bases minimize residual interference energy more effectively than frozen-basis approaches. Experiments report improved identity preservation and scalability on up to 101 concepts without post-hoc fusion.

Significance. If the convergence guarantees and sub-Gaussian-derived forgetting bounds hold with reasonable constants, SeqLoRA would provide a theoretically grounded method for scalable continual adaptation in diffusion models, addressing a key limitation of existing modular PEFT techniques. The experimental scale to 101 concepts is a positive indicator of practical relevance, though the overall impact depends on whether the probabilistic bounds are tight enough to guide design choices beyond the specific setups tested.

major comments (1)

[Theoretical section] Theoretical section: The high-probability bounds on catastrophic forgetting are obtained by modeling residual layer activations as a matrix sub-Gaussian process. The manuscript does not report any empirical verification of the sub-Gaussian tail condition (e.g., bounded Orlicz norm or moment-generating function control) on the actual post-LoRA residuals from the diffusion denoising process. Because heavier tails are common in stochastic diffusion trajectories, this assumption is load-bearing for the claim that learned bases minimize interference energy more effectively than frozen ones; without validation or a sensitivity analysis, the bounds risk being non-informative.

minor comments (1)

[Experiments] The experimental section would benefit from additional details on how the 101 concepts were selected and on the precise definition of the interference and identity-preservation metrics to facilitate reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the thoughtful comment on the theoretical section. We address the concern regarding empirical validation of the sub-Gaussian assumption and commit to revisions that strengthen the presentation of our bounds.

read point-by-point responses

Referee: The high-probability bounds on catastrophic forgetting are obtained by modeling residual layer activations as a matrix sub-Gaussian process. The manuscript does not report any empirical verification of the sub-Gaussian tail condition (e.g., bounded Orlicz norm or moment-generating function control) on the actual post-LoRA residuals from the diffusion denoising process. Because heavier tails are common in stochastic diffusion trajectories, this assumption is load-bearing for the claim that learned bases minimize interference energy more effectively than frozen ones; without validation or a sensitivity analysis, the bounds risk being non-informative.

Authors: We appreciate the referee pointing out this gap. The matrix sub-Gaussian process model for residual activations is used to derive the high-probability bounds on forgetting and to establish that data-driven bases reduce interference energy relative to frozen bases. While this modeling choice is standard for obtaining concentration results in high-dimensional stochastic settings, we acknowledge that the current manuscript does not include direct empirical checks on the tail behavior of post-LoRA residuals from the diffusion process. In the revised version we will add an appendix section containing empirical verification, including estimates of the Orlicz norm and moment-generating function behavior for residuals across layers and concepts. We will also include a sensitivity analysis examining how the bounds behave under controlled deviations from sub-Gaussianity. These additions will make the theoretical claims more robust and directly address the concern about practical relevance. revision: yes

Circularity Check

0 steps flagged

No circularity; derivations rest on explicit modeling assumptions and independent proofs

full rationale

The paper's central claims—convergence guarantees for bilevel LoRA optimization, high-probability forgetting bounds obtained by modeling residual activations as a matrix sub-Gaussian process, and the proof that data-driven basis learning minimizes interference energy—do not reduce to their own inputs by construction. The sub-Gaussian modeling is introduced as an assumption to derive bounds rather than being fitted from the same data and then relabeled as a prediction. No self-definitional equations, fitted-input predictions, load-bearing self-citations, or ansatz smuggling via prior work appear in the derivation chain. The results remain self-contained once the modeling assumption is granted, with no quoted reduction showing Eq. X equivalent to a fitted parameter or self-citation by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the sub-Gaussian process modeling of residuals and the bilevel optimization setup; no explicit free parameters or invented entities are stated in the abstract.

axioms (1)

domain assumption Residual layer activations follow a matrix sub-Gaussian process
Used to derive high-probability bounds on catastrophic forgetting

pith-pipeline@v0.9.0 · 5696 in / 1226 out tokens · 48338 ms · 2026-05-22T07:01:13.284226+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

300 extracted references · 300 canonical work pages

[1]

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Year =

A new framework for designing incoherent sparsifying dictionaries , Author =. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Year =

work page
[2]

Monge, Gaspard , journal=. M

work page
[3]

Doklady Akademii Nauk SSSR , volume=

On the translocation of masses , author=. Doklady Akademii Nauk SSSR , volume=

work page
[4]

and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , booktitle =

Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , booktitle =

work page
[5]

International Conference on Artificial Intelligence and Statistics (AISTATS) , pages =

Orthogonal Gradient Descent for Continual Learning , author =. International Conference on Artificial Intelligence and Statistics (AISTATS) , pages =. 2020 , publisher =

work page 2020
[6]

2019 , publisher =

High-Dimensional Statistics: A Non-Asymptotic Viewpoint , author =. 2019 , publisher =

work page 2019
[7]

2018 , edition =

Foundations of Machine Learning , author =. 2018 , edition =

work page 2018
[8]

Proceedings of the 17th International Conference on Machine Learning (ICML) , pages =

Crafting Papers on Machine Learning , author =. Proceedings of the 17th International Conference on Machine Learning (ICML) , pages =. 2000 , publisher =

work page 2000
[9]

2003 , publisher=

Topics in Optimal Transportation , author=. 2003 , publisher=

work page 2003
[10]

2009 , publisher=

Optimal Transport: Old and New , author=. 2009 , publisher=

work page 2009
[11]

Communications on Pure and Applied Mathematics , volume=

Polar factorization and monotone rearrangement of vector-valued functions , author=. Communications on Pure and Applied Mathematics , volume=

work page
[12]

The Michigan Mathematical Journal , volume=

Contributions to the theory of convex bodies , author=. The Michigan Mathematical Journal , volume=

work page
[13]

Dinh, Laurent and Krueger, David and Bengio, Yoshua , booktitle=

work page
[14]

Density estimation using Real

Dinh, Laurent and Sohl-Dickstein, Jascha and Bengio, Samy , booktitle=. Density estimation using Real

work page
[15]

Advances in Neural Information Processing Systems (NeurIPS) , pages=

Neural spline flows , author=. Advances in Neural Information Processing Systems (NeurIPS) , pages=

work page
[16]

Advances in Neural Information Processing Systems (NeurIPS) , volume=

Improved variational inference with inverse autoregressive flow , author=. Advances in Neural Information Processing Systems (NeurIPS) , volume=

work page
[17]

Advances in Neural Information Processing Systems (NeurIPS) , pages=

Masked autoregressive flow for density estimation , author=. Advances in Neural Information Processing Systems (NeurIPS) , pages=

work page
[18]

Advances in Neural Information Processing Systems (NeurIPS) , pages=

Glow: Generative flow with invertible 1x1 convolutions , author=. Advances in Neural Information Processing Systems (NeurIPS) , pages=

work page
[19]

The Annals of Mathematical Statistics , volume=

Remarks on a multivariate transformation , author=. The Annals of Mathematical Statistics , volume=

work page
[20]

2015 , publisher=

Optimal Transport for Applied Mathematicians , author=. 2015 , publisher=

work page 2015
[21]

Journal of the American Statistical Association , volume=

Bayesian variable selection in linear regression , author=. Journal of the American Statistical Association , volume=

work page
[22]

The Annals of Statistics , volume=

Spike and slab variable selection: frequentist and Bayesian strategies , author=. The Annals of Statistics , volume=

work page
[23]

Bartlett, Peter L and Bousquet, Olivier and Mendelson, Shahar , journal=. Local

work page
[24]

Koltchinskii, Vladimir , journal=. Local

work page
[25]

1968 , address =

Samuel Karlin , title =. 1968 , address =

work page 1968
[26]

Studden , title =

Samuel Karlin and William J. Studden , title =. 1966 , address =

work page 1966
[27]

F. R. Gantmacher and M. G. Krein , title =. 2002 , series =

work page 2002
[28]

I. J. Schoenberg , title =. Journal d'Analyse Math. 1951 , doi =

work page 1951
[29]

Bartlett, Peter L and Mendelson, Shahar , journal=

work page
[30]

2018 , publisher=

High-Dimensional Probability: An Introduction with Applications in Data Science , author=. 2018 , publisher=

work page 2018
[31]

1996 , publisher=

Weak Convergence and Empirical Processes , author=. 1996 , publisher=

work page 1996
[32]

Advances in Neural Information Processing Systems (NeurIPS) , pages=

Spectrally-normalized margin bounds for neural networks , author=. Advances in Neural Information Processing Systems (NeurIPS) , pages=

work page
[33]

Conference on Learning Theory (COLT) , pages=

Size-independent sample complexity of neural networks , author=. Conference on Learning Theory (COLT) , pages=

work page
[34]

International Conference on Learning Representations (ICLR) , year=

Spectral normalization for generative adversarial networks , author=. International Conference on Learning Representations (ICLR) , year=

work page
[35]

Zhao, Puning and Yu, Fei and Wan, Zhiguo , title =. Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence , articleno =. 2024 , isbn =. doi:10.1609/aaai.v38i19.30181 , abstract =

work page doi:10.1609/aaai.v38i19.30181 2024
[36]

2013 , publisher=

Theory of optimal experiments , author=. 2013 , publisher=

work page 2013
[37]

International Conference on Machine Learning , pages=

Byzantine machine learning made easy by resilient averaging of momentums , author=. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022
[38]

Problems and Theorems in Analysis I: Series, Integral Calculus, Theory of Functions , year =

P. Problems and Theorems in Analysis I: Series, Integral Calculus, Theory of Functions , year =

work page
[39]

, biburl =

Lynch, Nancy A. , biburl =. Distributed Algorithms , url =

work page
[40]

LeCun, Yann and Cortes, Corinna , biburl =

work page
[41]

2009 , publisher=

Learning multiple layers of features from tiny images , author=. 2009 , publisher=

work page 2009
[42]

International Conference on Machine Learning , pages=

Zeno: Distributed stochastic gradient descent with suspicion-based fault-tolerance , author=. International Conference on Machine Learning , pages=. 2019 , organization=

work page 2019
[43]

and Harchaoui, Zaid , journal=

Pillutla, Krishna and Kakade, Sham M. and Harchaoui, Zaid , journal=. Robust Aggregation for Federated Learning , year=

work page
[44]

Proceedings of the 40th International Conference on Machine Learning , articleno =

Liu, Yuchen and Chen, Chen and Lyu, Lingjuan and Wu, Fangzhao and Wu, Sai and Chen, Gang , title =. Proceedings of the 40th International Conference on Machine Learning , articleno =. 2023 , publisher =

work page 2023
[45]

Proceedings of the 30th International Conference on Machine Learning - Volume 28 , pages =

Kyrillidis, Anastasios and Becker, Stephen and Cevher, Volkan and Koch, Christoph , title =. Proceedings of the 30th International Conference on Machine Learning - Volume 28 , pages =. 2013 , publisher =

work page 2013
[46]

Sur les fonctions convexes et les in

Jensen, Johan Ludwig William Valdemar , journal=. Sur les fonctions convexes et les in. 1906 , publisher=

work page 1906
[47]

2004 , publisher=

Convex Optimization , author=. 2004 , publisher=

work page 2004
[48]

Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods , volume =

Attouch, Hedy and Bolte, Jérôme and Svaiter, Benar , year =. Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods , volume =. Mathematical Programming , doi =

work page
[49]

Proceedings of Machine learning and systems , volume=

Federated optimization in heterogeneous networks , author=. Proceedings of Machine learning and systems , volume=

work page
[50]

Advances in Neural Information Processing Systems , volume=

A little is enough: Circumventing defenses for distributed learning , author=. Advances in Neural Information Processing Systems , volume=

work page
[51]

Proceedings of the AAAI conference on artificial intelligence , volume=

Provably secure federated learning against malicious clients , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

work page
[52]

International conference on machine learning , pages=

Byzantine-robust distributed learning: Towards optimal statistical rates , author=. International conference on machine learning , pages=. 2018 , organization=

work page 2018
[53]

International Conference on Machine Learning , pages=

The hidden vulnerability of distributed learning in byzantium , author=. International Conference on Machine Learning , pages=. 2018 , organization=

work page 2018
[54]

Advances in neural information processing systems , volume=

Machine learning with adversaries: Byzantine tolerant gradient descent , author=. Advances in neural information processing systems , volume=

work page
[55]

International conference on machine learning , pages=

Scaffold: Stochastic controlled averaging for federated learning , author=. International conference on machine learning , pages=. 2020 , organization=

work page 2020
[56]

Advances in Neural Information Processing Systems , volume=

Fast projection onto the capped simplex with applications to sparse regression in bioinformatics , author=. Advances in Neural Information Processing Systems , volume=

work page
[57]

2002 , publisher=

Foundations of Bilevel Programming , author=. 2002 , publisher=

work page 2002
[58]

Mathematical Programming , volume=

Efficiency of the prox-linear algorithm for composite optimization , author=. Mathematical Programming , volume=. 2019 , publisher=

work page 2019
[59]

SIAM Journal on Optimization , volume=

A unified convergence analysis of block successive minimization methods for nonsmooth optimization , author=. SIAM Journal on Optimization , volume=. 2013 , publisher=

work page 2013
[60]

2015 , eprint=

Projection onto the capped simplex , author=. 2015 , eprint=

work page 2015
[61]

International Conference on Machine Learning , year=

The Hidden Vulnerability of Distributed Learning in Byzantium , author=. International Conference on Machine Learning , year=

work page
[62]

Proceedings of the 38th International Conference on Machine Learning , pages =

Learning from History for Byzantine Robust Optimization , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =

work page 2021
[63]

International Conference on Learning Representations , year=

Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing , author=. International Conference on Learning Representations , year=

work page
[64]

International Conference on Machine Learning , pages=

Byzantine-robust learning on heterogeneous data via gradient splitting , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023
[65]

29th USENIX security symposium (USENIX Security 20) , pages=

Local model poisoning attacks to \ Byzantine-Robust \ federated learning , author=. 29th USENIX security symposium (USENIX Security 20) , pages=

work page
[66]

2019 , issue_date =

Yang, Qiang and Liu, Yang and Chen, Tianjian and Tong, Yongxin , title =. 2019 , issue_date =. doi:10.1145/3298981 , journal =

work page doi:10.1145/3298981 2019
[67]

NPJ digital medicine , volume=

The future of digital health with federated learning , author=. NPJ digital medicine , volume=. 2020 , publisher=

work page 2020
[68]

Proceedings 2021 Network and Distributed System Security Symposium , year=

Manipulating the Byzantine: Optimizing Model Poisoning Attacks and Defenses for Federated Learning , author=. Proceedings 2021 Network and Distributed System Security Symposium , year=

work page 2021
[69]

2019 , URL =

Towards Federated Learning at Scale: System Design , author =. 2019 , URL =

work page 2019
[70]

Federated Learning: Challenges, Methods, and Future Directions , year=

Li, Tian and Sahu, Anit Kumar and Talwalkar, Ameet and Smith, Virginia , journal=. Federated Learning: Challenges, Methods, and Future Directions , year=

work page
[71]

Foundations and trends

Advances and open problems in federated learning , author=. Foundations and trends. 2021 , publisher=

work page 2021
[72]

Mathematical Programming , volume=

On the convergence of the proximal algorithm for nonsmooth functions involving analytic features , author=. Mathematical Programming , volume=. 2009 , publisher=

work page 2009
[73]

2009 , publisher=

Variational Analysis , author=. 2009 , publisher=

work page 2009
[74]

2014 , DOI =

Bolte, Jerome and Sabach, Shoham and Teboulle, Marc , URL =. 2014 , DOI =

work page 2014
[75]

2017 , editor =

McMahan, Brendan and Moore, Eider and Ramage, Daniel and Hampson, Seth and Arcas, Blaise Aguera y , booktitle =. 2017 , editor =

work page 2017
[76]

and Ferdowsi, S

Abolghasemi, V. and Ferdowsi, S. and Sanei, S. , Journal =. Fast and incoherent dictionary learning algorithms with application to. 2015 , Number =

work page 2015
[77]

2004 IEEE International Conference on Robotics and Automation (IEEE Cat

YALMIP : a toolbox for modeling and optimization in MATLAB , author=. 2004 IEEE International Conference on Robotics and Automation (IEEE Cat. No.04CH37508) , year=

work page 2004
[78]

Global Optimization with Polynomials and the Problem of Moments , author=. SIAM J. Optim. , year=

work page
[79]

Saad, Yousef , biburl =

work page
[80]

Signal Processing , Year =

A gradient-based alternating minimization approach for optimization of the measurement matrix in compressive sensing , Author =. Signal Processing , Year =

work page

Showing first 80 references.

[1] [1]

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Year =

A new framework for designing incoherent sparsifying dictionaries , Author =. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Year =

work page

[2] [2]

Monge, Gaspard , journal=. M

work page

[3] [3]

Doklady Akademii Nauk SSSR , volume=

On the translocation of masses , author=. Doklady Akademii Nauk SSSR , volume=

work page

[4] [4]

and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , booktitle =

Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , booktitle =

work page

[5] [5]

International Conference on Artificial Intelligence and Statistics (AISTATS) , pages =

Orthogonal Gradient Descent for Continual Learning , author =. International Conference on Artificial Intelligence and Statistics (AISTATS) , pages =. 2020 , publisher =

work page 2020

[6] [6]

2019 , publisher =

High-Dimensional Statistics: A Non-Asymptotic Viewpoint , author =. 2019 , publisher =

work page 2019

[7] [7]

2018 , edition =

Foundations of Machine Learning , author =. 2018 , edition =

work page 2018

[8] [8]

Proceedings of the 17th International Conference on Machine Learning (ICML) , pages =

Crafting Papers on Machine Learning , author =. Proceedings of the 17th International Conference on Machine Learning (ICML) , pages =. 2000 , publisher =

work page 2000

[9] [9]

2003 , publisher=

Topics in Optimal Transportation , author=. 2003 , publisher=

work page 2003

[10] [10]

2009 , publisher=

Optimal Transport: Old and New , author=. 2009 , publisher=

work page 2009

[11] [11]

Communications on Pure and Applied Mathematics , volume=

Polar factorization and monotone rearrangement of vector-valued functions , author=. Communications on Pure and Applied Mathematics , volume=

work page

[12] [12]

The Michigan Mathematical Journal , volume=

Contributions to the theory of convex bodies , author=. The Michigan Mathematical Journal , volume=

work page

[13] [13]

Dinh, Laurent and Krueger, David and Bengio, Yoshua , booktitle=

work page

[14] [14]

Density estimation using Real

Dinh, Laurent and Sohl-Dickstein, Jascha and Bengio, Samy , booktitle=. Density estimation using Real

work page

[15] [15]

Advances in Neural Information Processing Systems (NeurIPS) , pages=

Neural spline flows , author=. Advances in Neural Information Processing Systems (NeurIPS) , pages=

work page

[16] [16]

Advances in Neural Information Processing Systems (NeurIPS) , volume=

Improved variational inference with inverse autoregressive flow , author=. Advances in Neural Information Processing Systems (NeurIPS) , volume=

work page

[17] [17]

Advances in Neural Information Processing Systems (NeurIPS) , pages=

Masked autoregressive flow for density estimation , author=. Advances in Neural Information Processing Systems (NeurIPS) , pages=

work page

[18] [18]

Advances in Neural Information Processing Systems (NeurIPS) , pages=

Glow: Generative flow with invertible 1x1 convolutions , author=. Advances in Neural Information Processing Systems (NeurIPS) , pages=

work page

[19] [19]

The Annals of Mathematical Statistics , volume=

Remarks on a multivariate transformation , author=. The Annals of Mathematical Statistics , volume=

work page

[20] [20]

2015 , publisher=

Optimal Transport for Applied Mathematicians , author=. 2015 , publisher=

work page 2015

[21] [21]

Journal of the American Statistical Association , volume=

Bayesian variable selection in linear regression , author=. Journal of the American Statistical Association , volume=

work page

[22] [22]

The Annals of Statistics , volume=

Spike and slab variable selection: frequentist and Bayesian strategies , author=. The Annals of Statistics , volume=

work page

[23] [23]

Bartlett, Peter L and Bousquet, Olivier and Mendelson, Shahar , journal=. Local

work page

[24] [24]

Koltchinskii, Vladimir , journal=. Local

work page

[25] [25]

1968 , address =

Samuel Karlin , title =. 1968 , address =

work page 1968

[26] [26]

Studden , title =

Samuel Karlin and William J. Studden , title =. 1966 , address =

work page 1966

[27] [27]

F. R. Gantmacher and M. G. Krein , title =. 2002 , series =

work page 2002

[28] [28]

I. J. Schoenberg , title =. Journal d'Analyse Math. 1951 , doi =

work page 1951

[29] [29]

Bartlett, Peter L and Mendelson, Shahar , journal=

work page

[30] [30]

2018 , publisher=

High-Dimensional Probability: An Introduction with Applications in Data Science , author=. 2018 , publisher=

work page 2018

[31] [31]

1996 , publisher=

Weak Convergence and Empirical Processes , author=. 1996 , publisher=

work page 1996

[32] [32]

Advances in Neural Information Processing Systems (NeurIPS) , pages=

Spectrally-normalized margin bounds for neural networks , author=. Advances in Neural Information Processing Systems (NeurIPS) , pages=

work page

[33] [33]

Conference on Learning Theory (COLT) , pages=

Size-independent sample complexity of neural networks , author=. Conference on Learning Theory (COLT) , pages=

work page

[34] [34]

International Conference on Learning Representations (ICLR) , year=

Spectral normalization for generative adversarial networks , author=. International Conference on Learning Representations (ICLR) , year=

work page

[35] [35]

Zhao, Puning and Yu, Fei and Wan, Zhiguo , title =. Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence , articleno =. 2024 , isbn =. doi:10.1609/aaai.v38i19.30181 , abstract =

work page doi:10.1609/aaai.v38i19.30181 2024

[36] [36]

2013 , publisher=

Theory of optimal experiments , author=. 2013 , publisher=

work page 2013

[37] [37]

International Conference on Machine Learning , pages=

Byzantine machine learning made easy by resilient averaging of momentums , author=. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022

[38] [38]

Problems and Theorems in Analysis I: Series, Integral Calculus, Theory of Functions , year =

P. Problems and Theorems in Analysis I: Series, Integral Calculus, Theory of Functions , year =

work page

[39] [39]

, biburl =

Lynch, Nancy A. , biburl =. Distributed Algorithms , url =

work page

[40] [40]

LeCun, Yann and Cortes, Corinna , biburl =

work page

[41] [41]

2009 , publisher=

Learning multiple layers of features from tiny images , author=. 2009 , publisher=

work page 2009

[42] [42]

International Conference on Machine Learning , pages=

Zeno: Distributed stochastic gradient descent with suspicion-based fault-tolerance , author=. International Conference on Machine Learning , pages=. 2019 , organization=

work page 2019

[43] [43]

and Harchaoui, Zaid , journal=

Pillutla, Krishna and Kakade, Sham M. and Harchaoui, Zaid , journal=. Robust Aggregation for Federated Learning , year=

work page

[44] [44]

Proceedings of the 40th International Conference on Machine Learning , articleno =

Liu, Yuchen and Chen, Chen and Lyu, Lingjuan and Wu, Fangzhao and Wu, Sai and Chen, Gang , title =. Proceedings of the 40th International Conference on Machine Learning , articleno =. 2023 , publisher =

work page 2023

[45] [45]

Proceedings of the 30th International Conference on Machine Learning - Volume 28 , pages =

Kyrillidis, Anastasios and Becker, Stephen and Cevher, Volkan and Koch, Christoph , title =. Proceedings of the 30th International Conference on Machine Learning - Volume 28 , pages =. 2013 , publisher =

work page 2013

[46] [46]

Sur les fonctions convexes et les in

Jensen, Johan Ludwig William Valdemar , journal=. Sur les fonctions convexes et les in. 1906 , publisher=

work page 1906

[47] [47]

2004 , publisher=

Convex Optimization , author=. 2004 , publisher=

work page 2004

[48] [48]

Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods , volume =

Attouch, Hedy and Bolte, Jérôme and Svaiter, Benar , year =. Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods , volume =. Mathematical Programming , doi =

work page

[49] [49]

Proceedings of Machine learning and systems , volume=

Federated optimization in heterogeneous networks , author=. Proceedings of Machine learning and systems , volume=

work page

[50] [50]

Advances in Neural Information Processing Systems , volume=

A little is enough: Circumventing defenses for distributed learning , author=. Advances in Neural Information Processing Systems , volume=

work page

[51] [51]

Proceedings of the AAAI conference on artificial intelligence , volume=

Provably secure federated learning against malicious clients , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

work page

[52] [52]

International conference on machine learning , pages=

Byzantine-robust distributed learning: Towards optimal statistical rates , author=. International conference on machine learning , pages=. 2018 , organization=

work page 2018

[53] [53]

International Conference on Machine Learning , pages=

The hidden vulnerability of distributed learning in byzantium , author=. International Conference on Machine Learning , pages=. 2018 , organization=

work page 2018

[54] [54]

Advances in neural information processing systems , volume=

Machine learning with adversaries: Byzantine tolerant gradient descent , author=. Advances in neural information processing systems , volume=

work page

[55] [55]

International conference on machine learning , pages=

Scaffold: Stochastic controlled averaging for federated learning , author=. International conference on machine learning , pages=. 2020 , organization=

work page 2020

[56] [56]

Advances in Neural Information Processing Systems , volume=

Fast projection onto the capped simplex with applications to sparse regression in bioinformatics , author=. Advances in Neural Information Processing Systems , volume=

work page

[57] [57]

2002 , publisher=

Foundations of Bilevel Programming , author=. 2002 , publisher=

work page 2002

[58] [58]

Mathematical Programming , volume=

Efficiency of the prox-linear algorithm for composite optimization , author=. Mathematical Programming , volume=. 2019 , publisher=

work page 2019

[59] [59]

SIAM Journal on Optimization , volume=

A unified convergence analysis of block successive minimization methods for nonsmooth optimization , author=. SIAM Journal on Optimization , volume=. 2013 , publisher=

work page 2013

[60] [60]

2015 , eprint=

Projection onto the capped simplex , author=. 2015 , eprint=

work page 2015

[61] [61]

International Conference on Machine Learning , year=

The Hidden Vulnerability of Distributed Learning in Byzantium , author=. International Conference on Machine Learning , year=

work page

[62] [62]

Proceedings of the 38th International Conference on Machine Learning , pages =

Learning from History for Byzantine Robust Optimization , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =

work page 2021

[63] [63]

International Conference on Learning Representations , year=

Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing , author=. International Conference on Learning Representations , year=

work page

[64] [64]

International Conference on Machine Learning , pages=

Byzantine-robust learning on heterogeneous data via gradient splitting , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023

[65] [65]

29th USENIX security symposium (USENIX Security 20) , pages=

Local model poisoning attacks to \ Byzantine-Robust \ federated learning , author=. 29th USENIX security symposium (USENIX Security 20) , pages=

work page

[66] [66]

2019 , issue_date =

Yang, Qiang and Liu, Yang and Chen, Tianjian and Tong, Yongxin , title =. 2019 , issue_date =. doi:10.1145/3298981 , journal =

work page doi:10.1145/3298981 2019

[67] [67]

NPJ digital medicine , volume=

The future of digital health with federated learning , author=. NPJ digital medicine , volume=. 2020 , publisher=

work page 2020

[68] [68]

Proceedings 2021 Network and Distributed System Security Symposium , year=

Manipulating the Byzantine: Optimizing Model Poisoning Attacks and Defenses for Federated Learning , author=. Proceedings 2021 Network and Distributed System Security Symposium , year=

work page 2021

[69] [69]

2019 , URL =

Towards Federated Learning at Scale: System Design , author =. 2019 , URL =

work page 2019

[70] [70]

Federated Learning: Challenges, Methods, and Future Directions , year=

Li, Tian and Sahu, Anit Kumar and Talwalkar, Ameet and Smith, Virginia , journal=. Federated Learning: Challenges, Methods, and Future Directions , year=

work page

[71] [71]

Foundations and trends

Advances and open problems in federated learning , author=. Foundations and trends. 2021 , publisher=

work page 2021

[72] [72]

Mathematical Programming , volume=

On the convergence of the proximal algorithm for nonsmooth functions involving analytic features , author=. Mathematical Programming , volume=. 2009 , publisher=

work page 2009

[73] [73]

2009 , publisher=

Variational Analysis , author=. 2009 , publisher=

work page 2009

[74] [74]

2014 , DOI =

Bolte, Jerome and Sabach, Shoham and Teboulle, Marc , URL =. 2014 , DOI =

work page 2014

[75] [75]

2017 , editor =

McMahan, Brendan and Moore, Eider and Ramage, Daniel and Hampson, Seth and Arcas, Blaise Aguera y , booktitle =. 2017 , editor =

work page 2017

[76] [76]

and Ferdowsi, S

Abolghasemi, V. and Ferdowsi, S. and Sanei, S. , Journal =. Fast and incoherent dictionary learning algorithms with application to. 2015 , Number =

work page 2015

[77] [77]

2004 IEEE International Conference on Robotics and Automation (IEEE Cat

YALMIP : a toolbox for modeling and optimization in MATLAB , author=. 2004 IEEE International Conference on Robotics and Automation (IEEE Cat. No.04CH37508) , year=

work page 2004

[78] [78]

Global Optimization with Polynomials and the Problem of Moments , author=. SIAM J. Optim. , year=

work page

[79] [79]

Saad, Yousef , biburl =

work page

[80] [80]

Signal Processing , Year =

A gradient-based alternating minimization approach for optimization of the measurement matrix in compressive sensing , Author =. Signal Processing , Year =

work page