MetaAdamW uses a lightweight Transformer encoder on gradient and momentum statistics to adapt learning rates and weight decay per parameter group, trained via a meta-objective with gradient alignment, loss decrease, and generalization gap plus priority-injected homoscedastic uncertainty weighting.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 2roles
method 1polarities
use method 1representative citing papers
GLM-4 models rival or exceed GPT-4 on MMLU, GSM8K, MATH, BBH, GPQA, HumanEval, IFEval, long-context tasks, and Chinese alignment while adding autonomous tool use for web, code, and image generation.
citing papers explorer
-
A Self-Attentive Meta-Optimizer with Group-Adaptive Learning Rates and Weight Decay
MetaAdamW uses a lightweight Transformer encoder on gradient and momentum statistics to adapt learning rates and weight decay per parameter group, trained via a meta-objective with gradient alignment, loss decrease, and generalization gap plus priority-injected homoscedastic uncertainty weighting.
-
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
GLM-4 models rival or exceed GPT-4 on MMLU, GSM8K, MATH, BBH, GPQA, HumanEval, IFEval, long-context tasks, and Chinese alignment while adding autonomous tool use for web, code, and image generation.