Early stopping-but when? InNeural Networks: Tricks of the trade, pages 55–69

Lutz Prechelt · 2002

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

ALTO: Adaptive LoRA Tuning and Orchestration for Heterogeneous LoRA Training Workloads

cs.LG · 2026-04-07 · unverdicted · novelty 7.0

ALTO accelerates LoRA tuning up to 13.8x by monitoring loss trajectories for early stopping, using fused grouped GEMM with rank-local adapter parallelism, and combining intra- and inter-task scheduling for heterogeneous workloads without quality loss.

A Ridge Too Far: Correcting Over-Shrinkage via Negative Regularization

cs.LG · 2025-08-24 · unverdicted · novelty 6.0

Negative-capable ridge regression uses controlled negative regularization as anti-shrinkage to increase effective complexity along weak eigendirections and mitigate underfitting in small-data regression.

STAR: Decode-Phase Rescheduling for LLM Inference

cs.DC · 2025-10-15 · unverdicted · novelty 5.0

STAR cuts P99 TPOT by 75.1% and raises goodput 2.63x via a lightweight hidden-state length predictor and dynamic decode rescheduling that combines current and predicted loads.

citing papers explorer

Showing 3 of 3 citing papers.

ALTO: Adaptive LoRA Tuning and Orchestration for Heterogeneous LoRA Training Workloads cs.LG · 2026-04-07 · unverdicted · none · ref 28
ALTO accelerates LoRA tuning up to 13.8x by monitoring loss trajectories for early stopping, using fused grouped GEMM with rank-local adapter parallelism, and combining intra- and inter-task scheduling for heterogeneous workloads without quality loss.
A Ridge Too Far: Correcting Over-Shrinkage via Negative Regularization cs.LG · 2025-08-24 · unverdicted · none · ref 55
Negative-capable ridge regression uses controlled negative regularization as anti-shrinkage to increase effective complexity along weak eigendirections and mitigate underfitting in small-data regression.
STAR: Decode-Phase Rescheduling for LLM Inference cs.DC · 2025-10-15 · unverdicted · none · ref 26
STAR cuts P99 TPOT by 75.1% and raises goodput 2.63x via a lightweight hidden-state length predictor and dynamic decode rescheduling that combines current and predicted loads.

Early stopping-but when? InNeural Networks: Tricks of the trade, pages 55–69

fields

years

verdicts

representative citing papers

citing papers explorer