Forty-first International Conference on Machine Learning , year=

ULTRAFEEDBACK: Boosting Language Models with Scaled AI Feedback , author=

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

Validity-Calibrated Reasoning Distillation

cs.LG · 2026-04-14 · unverdicted · novelty 7.0

Validity-calibrated reasoning distillation improves transfer of reasoning skills by modulating updates based on relative local validity of next steps instead of enforcing full trajectory imitation.

CLIPer: Tailoring Diverse User Preference via Classifier-Guided Inference-Time Personalization

cs.CL · 2026-05-08 · unverdicted · novelty 5.0

CLIPer uses classifier guidance during inference to personalize LLM generations across single and multi-dimensional user preferences without extensive fine-tuning.

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

cs.CL · 2025-02-04 · unverdicted · novelty 5.0

SmolLM2 is a 1.7B-parameter language model that outperforms Qwen2.5-1.5B and Llama3.2-1B after overtraining on 11 trillion tokens using custom FineMath, Stack-Edu, and SmolTalk datasets in a multi-stage pipeline.

citing papers explorer

Showing 3 of 3 citing papers.

Validity-Calibrated Reasoning Distillation cs.LG · 2026-04-14 · unverdicted · none · ref 70
Validity-calibrated reasoning distillation improves transfer of reasoning skills by modulating updates based on relative local validity of next steps instead of enforcing full trajectory imitation.
CLIPer: Tailoring Diverse User Preference via Classifier-Guided Inference-Time Personalization cs.CL · 2026-05-08 · unverdicted · none · ref 11
CLIPer uses classifier guidance during inference to personalize LLM generations across single and multi-dimensional user preferences without extensive fine-tuning.
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model cs.CL · 2025-02-04 · unverdicted · none · ref 103
SmolLM2 is a 1.7B-parameter language model that outperforms Qwen2.5-1.5B and Llama3.2-1B after overtraining on 11 trillion tokens using custom FineMath, Stack-Edu, and SmolTalk datasets in a multi-stage pipeline.

Forty-first International Conference on Machine Learning , year=

fields

years

verdicts

representative citing papers

citing papers explorer