Journal of physics: Conference series , volume=

An overview of overfitting, its solutions , author= · 2019

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

Why SGD is not Brownian Motion: A New Perspective on Stochastic Dynamics

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

SGD is reformulated via a master equation from discrete updates, producing a discrete Fokker-Planck equation that predicts non-stationary variance growth proportional to learning rate in flat Hessian directions.

DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices

cs.LG · 2026-05-11 · unverdicted · novelty 6.0 · 3 refs

DECO is a sparse MoE architecture with ReLU-based routing, learnable expert scaling, and NormSiLU activation that matches dense Transformer performance at 20% expert activation and delivers 2.93x speedup on Jetson AGX Orin.

ERPPO: Entropy Regularization-based Proximal Policy Optimization

cs.LG · 2026-05-13 · unverdicted · novelty 5.0

ERPPO adds a DSA-based ambiguity estimator to MAPPO and switches between L1 and L2 entropy regularization to improve exploration and stability in non-stationary multi-dimensional observations.

citing papers explorer

Showing 3 of 3 citing papers.

Why SGD is not Brownian Motion: A New Perspective on Stochastic Dynamics cs.LG · 2026-05-21 · unverdicted · none · ref 85
SGD is reformulated via a master equation from discrete updates, producing a discrete Fokker-Planck equation that predicts non-stationary variance growth proportional to learning rate in flat Hessian directions.
DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices cs.LG · 2026-05-11 · unverdicted · none · ref 34 · 3 links
DECO is a sparse MoE architecture with ReLU-based routing, learnable expert scaling, and NormSiLU activation that matches dense Transformer performance at 20% expert activation and delivers 2.93x speedup on Jetson AGX Orin.
ERPPO: Entropy Regularization-based Proximal Policy Optimization cs.LG · 2026-05-13 · unverdicted · none · ref 110
ERPPO adds a DSA-based ambiguity estimator to MAPPO and switches between L1 and L2 entropy regularization to improve exploration and stability in non-stationary multi-dimensional observations.

Journal of physics: Conference series , volume=

fields

years

verdicts

representative citing papers

citing papers explorer