pith. sign in

Advances in Neural Information Processing Systems , volume=

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

fields

cs.CL 3 cs.LG 2

years

2026 4 2025 1

representative citing papers

Base Models Look Human To AI Detectors

cs.CL · 2026-05-19 · unverdicted · novelty 7.0

Base model text evades AI detectors better than instruction-tuned text, and the HIP method strengthens this trade-off across model sizes.

Prescriptive Scaling Laws for Data Constrained Training

cs.LG · 2026-05-02 · unverdicted · novelty 6.0

A one-parameter scaling law models excess loss from data repetition as an additive overfitting penalty, recommending model capacity increases over excessive repetition and showing that strong weight decay reduces the penalty coefficient by ~70%.

ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution

cs.CL · 2025-09-17 · unverdicted · novelty 6.0

ShinkaEvolve improves sample efficiency in LLM-driven program evolution via parent sampling, code novelty rejection-sampling, and bandit LLM ensemble selection, achieving new SOTA circle packing with 150 samples and gains on math reasoning and competitive programming tasks.

citing papers explorer

Showing 5 of 5 citing papers.

  • Base Models Look Human To AI Detectors cs.CL · 2026-05-19 · unverdicted · none · ref 30

    Base model text evades AI detectors better than instruction-tuned text, and the HIP method strengthens this trade-off across model sizes.

  • Prescriptive Scaling Laws for Data Constrained Training cs.LG · 2026-05-02 · unverdicted · none · ref 7

    A one-parameter scaling law models excess loss from data repetition as an additive overfitting penalty, recommending model capacity increases over excessive repetition and showing that strong weight decay reduces the penalty coefficient by ~70%.

  • ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution cs.CL · 2025-09-17 · unverdicted · none · ref 164

    ShinkaEvolve improves sample efficiency in LLM-driven program evolution via parent sampling, code novelty rejection-sampling, and bandit LLM ensemble selection, achieving new SOTA circle packing with 150 samples and gains on math reasoning and competitive programming tasks.

  • When and Why Grouping Attention Heads Accelerates Muon Optimization cs.LG · 2026-05-09 · unverdicted · none · ref 14

    Grouping attention heads in Muon creates a trade-off between whitening gains and norm costs that, when tuned, improves training loss over full or per-head Muon on GPT-2.

  • Heterogeneity in Formal Linguistic Competence of Language Models: Is Data the Real Bottleneck? cs.CL · 2026-04-20 · conditional · none · ref 17

    Injecting 1% synthetic data targeting specific constructions during pre-training of GPT-2 Small boosts performance on 8 of 9 weakest BLiMP paradigms (e.g., only_npi_scope from 20.9% to 69.4%), while aggregate performance holds or improves, with one resistant case.