{"total":11,"items":[{"citing_arxiv_id":"2605.22644","ref_index":21,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Why SGD is not Brownian Motion: A New Perspective on Stochastic Dynamics","primary_cat":"cs.LG","submitted_at":"2026-05-21T15:50:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SGD is reformulated via a master equation from discrete updates, producing a discrete Fokker-Planck equation that predicts non-stationary variance growth proportional to learning rate in flat Hessian directions.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Langevin-inspired surrogate dynamics and show that the two descriptions differ already at orderη2. A.1 Exact recursion from the master equation Our starting point is the master equation, which relates the parameter distribution at two successive iterations (Eq. 3): pn+1(w) = Z pn(v)E L [δ(w−v+η∇L(v))]dv.(20) We employ the representation of the delta function via its Fourier transform: δ(a) = Z dp (2π)d eip⊤a, p∈R d.(21) This yields: pn+1(w) = Z dv dp 1 (2π)d pn(v)eip⊤(w−v) EL h eiηp⊤∇L(v) i .(22) We now expand the exponential inη: pn+1(w) = ∞X m=0 (iη)m m! dX i1,...,im=1 Z dv dp 1 (2π)d pn(v)eip⊤(w−v)pi1 · · ·p imEL [∇i1L(v)· · · ∇ imL(v)].(23) Using: Z dp (2π)d eip⊤(w−v)pi1 · · ·p im = (−i)m∇i1 · · · ∇imδ(w−v),(24) we obtain the recursion: pn+1(w) = ∞X m=0 ηm m!"},{"citing_arxiv_id":"2605.19510","ref_index":54,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Return of Frustratingly Easy Unsupervised Video Domain Adaptation","primary_cat":"cs.CV","submitted_at":"2026-05-19T08:07:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"MetaTrans improves unsupervised video domain adaptation performance by separating and subtracting spatial and temporal divergences via a dedicated module and a minimal two-term loss objective.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19392","ref_index":245,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Understanding Dynamics of Adam in Zero-Sum Games: An ODE Approach","primary_cat":"cs.LG","submitted_at":"2026-05-19T05:38:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Derives ODE limits of Adam-DA showing that first- and second-order momentum parameters reverse their convergence roles in zero-sum games compared to minimization, validated on GAN experiments.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10792","ref_index":151,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Implicit Neural Optimal Transport via Fixed-Point Optimization","primary_cat":"math.OC","submitted_at":"2026-05-11T16:22:06+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01765","ref_index":21,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Distributional Causal Mediation via Conditional Generative Modeling","primary_cat":"stat.ML","submitted_at":"2026-05-03T07:57:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"UNKNOWN","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DCMA uses conditional generative models to recover and simulate interventional outcome distributions for distributional causal mediation effects, with derived error bounds.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20711","ref_index":4,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Participatory provenance as representational auditing for AI-mediated public consultation","primary_cat":"cs.AI","submitted_at":"2026-04-22T15:54:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Participatory provenance auditing of Canada's AI strategy consultation shows official AI summaries exclude 15-17% of participants more than random baselines, with 33-88% exclusion for dissent clusters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20430","ref_index":78,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"A discrete-time overdetermined problem for the heat equation","primary_cat":"math.AP","submitted_at":"2026-04-22T10:51:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A discrete-time constant flux condition on the heat equation forces the domain to be a ball under suitable regularity.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20115","ref_index":4,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"On the Stability and Generalization of First-order Bilevel Minimax Optimization","primary_cat":"cs.LG","submitted_at":"2026-04-22T02:27:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Provides the first systematic generalization analysis via algorithmic stability for single-timescale and two-timescale stochastic gradient descent-ascent in bilevel minimax problems.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"outer-level population risk w.r.tD1 and empirical risk w.r.tDm1. Denotex∈ Xas the upper variable,y∈ Yandz∈ Zare the inner variables for the minimax problem. Now we introduce the upper-level pop- ulation risk w.r.tD1 and the empirical risk w.r.txDm1 defined respectively as R(x, y, z) =E ξ∼D1[f(x, y(x), z(x);ξ)],(3) and RDm1 (x, y, z) = 1 m1 m1X i=1 [f(x, y(x), z(x);ξ i)],(4) wherefis the objective function at the upper level.y(x)andz(x)are the inner model parameters given the outer model parameterx(also see Eq.(2)). Let(x, y(x), z(x))in Eq.(2) be estimated by a stochastic algorithmAwith dataD m1,D m2, i.e.A(D m1,D m2). Similar to the previous works (Bao et al., 2021; Hoffer et al., 2017; Keskar et al., 2017) for evaluating the approximated"},{"citing_arxiv_id":"2502.07529","ref_index":126,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Training Deep Learning Models with Norm-Constrained LMOs","primary_cat":"cs.LG","submitted_at":"2025-02-11T13:10:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Scion is a new stochastic LMO-based optimizer family that unifies existing methods, supports unconstrained problems, and delivers hyperparameter transferability plus speedups on nanoGPT training.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2209.14687","ref_index":18,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Diffusion Posterior Sampling for General Noisy Inverse Problems","primary_cat":"stat.ML","submitted_at":"2022-09-29T11:12:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Diffusion models solve noisy (non)linear inverse problems via approximated posterior sampling that blends diffusion steps with manifold gradients without strict consistency projection.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"The likelihood function for the Poisson measurements under the i.i.d. assumption is given as p(y|x0) = nY j=1 [A(x0)]yj j exp [[−A(x0)]j] yj ! , (17) where j indexes the measurement bin. In most cases where the measured values are not too small, the model can be approximated by a Gaussian distribution with very high accuracy4. Namely, p(y|x0) → nY j=1 1p 2π[A(x0)]j exp \u0012 −(yj − [A(x0)]j)2 2[A(x0)]j \u0013 (18) ≃ nY j=1 1p2πyj exp \u0012 −(yj − [A(x0)]j)2 2yj \u0013 , (19) where we have used the standard approximation for the shot noise model [A(x0)]j ≃ yj to arrive at the last equation (Kingston, 2013). Then, similar to the Gaussian case, by differentiation and the use of Theorem 1, we have that ∇xt log p(y|xt) ≃ −ρ∇xt ∥y − A(x0)∥2 Λ, [Λ]ii ≜ 1/2yj, (20) where ∥a∥2"},{"citing_arxiv_id":"2209.14577","ref_index":82,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Rectified Flow: A Marginal Preserving Approach to Optimal Transport","primary_cat":"stat.ML","submitted_at":"2022-09-29T06:37:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A single-objective rectified flow variant uses neural ODEs trained by regression to monotonically decrease a fixed convex transport cost while preserving marginal distributions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}