pith. sign in

arxiv: 2306.14853 · v5 · pith:DNDQBF62new · submitted 2023-06-26 · 🧮 math.OC · cs.LG

Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles

classification 🧮 math.OC cs.LG
keywords epsilonmathcaloracletildeachievefirst-ordermethodstochastic
0
0 comments X
read the original abstract

In this work, we consider bilevel optimization when the lower-level problem is strongly convex. Recent works show that with a Hessian-vector product (HVP) oracle, one can provably find an $\epsilon$-stationary point within ${\mathcal{O}}(\epsilon^{-2})$ oracle calls. However, the HVP oracle may be inaccessible or expensive in practice. Kwon et al. (ICML 2023) addressed this issue by proposing a first-order method that can achieve the same goal at a slower rate of $\tilde{\mathcal{O}}(\epsilon^{-3})$. In this paper, we incorporate a two-time-scale update to improve their method to achieve the near-optimal $\tilde {\mathcal{O}}(\epsilon^{-2})$ first-order oracle complexity. Our analysis is highly extensible. In the stochastic setting, our algorithm can achieve the stochastic first-order oracle complexity of $\tilde {\mathcal{O}}(\epsilon^{-4})$ and $\tilde {\mathcal{O}}(\epsilon^{-6})$ when the stochastic noises are only in the upper-level objective and in both level objectives, respectively. When the objectives have higher-order smoothness conditions, our deterministic method can escape saddle points by injecting noise, and can be accelerated to achieve a faster rate of $\tilde {\mathcal{O}}(\epsilon^{-1.75})$ using Nesterov's momentum.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Second-Order Bilevel Optimization with Accelerated Convergence Rates

    math.OC 2026-05 unverdicted novelty 7.0

    Second-order bilevel methods achieve Õ(ε^{-1.5}) iteration complexity for second-order stationary points, faster than first-order approaches, with a lazy variant improving computational efficiency by √d.

  2. Fully First-Order Algorithms for Online Bilevel Optimization

    cs.LG 2026-02 unverdicted novelty 7.0

    A new first-order method for online bilevel optimization achieves regret O(1 + V_T + H_{2,T}) over O(T log T) iterations without Hessian-vector products.