The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale , url =

Penedo, Guilherme

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

unclear 1

representative citing papers

Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Energy-navigated trajectory shaping during training produces 8-step discrete flow matching students that achieve 32% lower perplexity than 1024-step teachers on 170M language models with unchanged inference cost.

Inference-Time Machine Unlearning via Gated Activation Redirection

cs.LG · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

GUARD-IT performs machine unlearning in LLMs via input-dependent activation steering at inference time, matching or exceeding gradient-based baselines on TOFU and MUSE while preserving utility and working under quantization.

Language models struggle with compartmentalization

cs.CL · 2026-05-19 · unverdicted · novelty 5.0

LLMs exhibit compartmentalization by learning separate internal representations for equivalent concepts presented differently, which reduces sample efficiency and resists unification even with synthetic parallel data.

citing papers explorer

Showing 3 of 3 citing papers.

Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation cs.LG · 2026-05-08 · unverdicted · none · ref 17
Energy-navigated trajectory shaping during training produces 8-step discrete flow matching students that achieve 32% lower perplexity than 1024-step teachers on 170M language models with unchanged inference cost.
Inference-Time Machine Unlearning via Gated Activation Redirection cs.LG · 2026-05-12 · unverdicted · none · ref 38 · 2 links
GUARD-IT performs machine unlearning in LLMs via input-dependent activation steering at inference time, matching or exceeding gradient-based baselines on TOFU and MUSE while preserving utility and working under quantization.
Language models struggle with compartmentalization cs.CL · 2026-05-19 · unverdicted · none · ref 4
LLMs exhibit compartmentalization by learning separate internal representations for equivalent concepts presented differently, which reduces sample efficiency and resists unification even with synthetic parallel data.

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale , url =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer