archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 20

cs.LG 2026-05-12 reviewed

Lifelong normalization yields stable updates over many edits
More Edits, More Stable: Understanding the Lifelong Normalization in Sequential Model Editing

Xin Ma +6
cs.LG 2026-05-12 reviewed

Expert swap and logit fix cut MoE perplexity 59% on noisy analog chips
ROMER: Expert Replacement and Router Calibration for Robust MoE LLMs on Analog Compute-in-Memory Systems

Wenyong Zhou +8
cs.CL 2026-05-12 reviewed

Reliable features enhance multiword expression classifications
Choosing features for classifying multiword expressions

Eric Laporte
cs.LG 2026-05-12 reviewed

Entropy polarity predicts whether updates expand or contract LLM policy entropy
Entropy Polarity in Reinforcement Fine-Tuning: Direction, Asymmetry, and Control

Jiazheng Zhang +19
cs.LG 2026-05-12 reviewed

Token-level entropy polarity predicts update direction in LLM RL
Entropy Polarity in Reinforcement Fine-Tuning: Direction, Asymmetry, and Control

Jiazheng Zhang +19
cs.CL 2026-05-12 reviewed

Token pair method cuts clinical LLM input by 31%
From Token to Token Pair: Efficient Prompt Compression for Large Language Models in Clinical Prediction

Mingcheng Zhu +3
cs.CL 2026-05-12 reviewed

ATC language models reach only 0.69 on safety risk score
Safety-Oriented Evaluation of Language Understanding Systems for Air Traffic Control

Yujing Chang +6
cs.RO 2026-05-12 reviewed

Robots dream short futures to dodge manipulation failures
DreamAvoid: Critical-Phase Test-Time Dreaming to Avoid Failures in VLA Policies

Xianzhe Fan +6
cs.LG 2026-05-12 reviewed

Single max nonconformity score covers every pipeline stage at 1-alpha
PASC: Pipeline-Aware Conformal Prediction with Joint Coverage Guarantees for Multi-Stage NLP and LLM Pipelines

Varun Kotte
cs.CL 2026-05-12 reviewed

Consistent segments match full attention at long contexts
Training-Inference Consistent Segmented Execution for Long-Context LLMs

Xianpeng Shang +4
cs.CL 2026-05-12 reviewed

On-policy distillation triples speed via early update alignment
Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

Yuchen Cai +11
cs.CL 2026-05-12 reviewed

On-policy distillation locks in final model path early for 3x speedup
Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

Yuchen Cai +11
cs.CL 2026-05-12 reviewed

On-policy distillation gains 3x speedup by locking stable paths early
Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

Yuchen Cai +11
cs.IR 2026-05-12 reviewed

Critic and generator agents iteratively refine research outlines
AgentDisCo: Towards Disentanglement and Collaboration in Open-ended Deep Research Agents

Jiarui Jin +4
cs.AI 2026-05-12 reviewed

Raw camera measurements cut vision-language errors
Allegory of the Cave: Measurement-Grounded Vision-Language Learning

Kepeng Xu +3
cs.LG 2026-05-12 reviewed

More total MoE parameters improve quality at fixed active count
Slicing and Dicing: Configuring Optimal Mixtures of Experts

Margaret Li +3
cs.CL 2026-05-12 reviewed

Minor representation components block LLM relearning attacks
Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter

Zeguan Xiao +6
cs.CL 2026-05-12 reviewed

Real Japanese middle-school exams benchmark AI with 900k student answers
Human-Grounded Multimodal Benchmark with 900K-Scale Aggregated Student Response Distributions from Japan's National Assessment of Academic Ability

Kyosuke Takami +3
cs.CV 2026-05-12 reviewed

Masked prefixes make small VLMs reason from images
Hide to See: Reasoning-prefix Masking for Visual-anchored Thinking in VLM Distillation

Seonghoon Yu +3
cs.CV 2026-05-12 reviewed

Masking reasoning prefixes anchors VLM thinking to visuals
Hide to See: Reasoning-prefix Masking for Visual-anchored Thinking in VLM Distillation

Seonghoon Yu +3
cs.CV 2026-05-12 reviewed

Masking prefixes anchors VLM thinking to images
Hide to See: Reasoning-prefix Masking for Visual-anchored Thinking in VLM Distillation

Seonghoon Yu +3
cs.CL 2026-05-12 reviewed

Macro boosts multilingual counterfactual validity by 12.55%
Macro: Enhancing Multilingual Counterfactual Explanations through Alignment-as-Preference Optimization

Yilong Wang +5
cs.CL 2026-05-12 reviewed

Distilled 4B model matches 8B baseline on multimodal reasoning
OmniThoughtVis: A Scalable Distillation Pipeline for Deployable Multimodal Reasoning Models

Yuanhao Yue +4
cs.CL 2026-05-12 reviewed

Emotional style triggers LLM backdoors at 99% success
When Emotion Becomes Trigger: Emotion-style dynamic Backdoor Attack Parasitising Large Language Models

Ziyu Liu +7
cs.LG 2026-05-12 reviewed

Reversing self-distillation cuts math reasoning training steps 2-10x
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

Guobin Shen +6
cs.CL 2026-05-12 reviewed

PRISM bound splits LLM drift into scale
PRISM: A Geometric Risk Bound that Decomposes Drift into Scale, Shape, and Head

Chieh-Yen Lin +1
cs.CL 2026-05-12 reviewed

Diffusion scoring evaluates text without left-to-right bias
DiffScore: Text Evaluation Beyond Autoregressive Likelihood

Wen Lai +6
cs.CL 2026-05-12 reviewed

Framework speeds LLM advertising with acceptable quality trade-off
Efficient LLM-based Advertising via Model Compression and Parallel Verification

Wenxin Dong +11
cs.CL 2026-05-12 reviewed

Compile-time DAG search boosts MegaKernel throughput for LLMs
Ada-MK: Adaptive MegaKernel Optimization via Automated DAG-based Search for LLM Inference

Wenxin Dong +9
cs.CL 2026-05-12 reviewed

Bitwise diffusion generates multiple tokens per block in language models
BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion

Shaobin Zhuang +9
cs.CL 2026-05-12 reviewed

Three regimes govern LLM responses to conflicting documents and training knowledge
Three Regimes of Context-Parametric Conflict: A Predictive Framework and Empirical Validation

Pruthvinath Jeripity Venkata
cs.CL 2026-05-12 reviewed

Covariance-weighted GRPO tames extreme tokens in LLM training
Taming Extreme Tokens: Covariance-Aware GRPO with Gaussian-Kernel Advantage Reweighting

Cheng Wang +3
cs.CL 2026-05-12 reviewed

2000-report dataset tests AI on patient action cards from check-ups
Checkup2Action: A Multimodal Clinical Check-up Report Dataset for Patient-Oriented Action Card Generation

Sike Xiang +6
cs.CL 2026-05-12 reviewed

Dataset benchmarks AI on safe action cards from check-up reports
Checkup2Action: A Multimodal Clinical Check-up Report Dataset for Patient-Oriented Action Card Generation

Sike Xiang +6
cs.AI 2026-05-12 reviewed

Trajectory labels bias simulators and explode variance under policy change
Controllable User Simulation

Guy Tennenholtz +5
cs.AI 2026-05-12 reviewed

Agents learn effective LLM configs from cheap trials
AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration - Learning from Cheap, Optimizing Expensive

Taicheng Guo +3
cs.AI 2026-05-12 reviewed

Agents learn from cheap LLM trials to guide expensive configurations
AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration - Learning from Cheap, Optimizing Expensive

Taicheng Guo +3
cs.CL 2026-05-12 reviewed

Hidden layers yield perplexity gains over logits in LLM pre-training
A Study on Hidden Layer Distillation for Large Language Model Pre-Training

Maxime Guigon +2
cs.CL 2026-05-12 reviewed

Controlled semantic perturbations combined with selective training let biomedical…
Robust Biomedical Publication Type and Study Design Classification with Knowledge-Guided Perturbations

Shufan Ming +3
cs.CL 2026-05-12 reviewed

300 Examples Align Small LLMs to Stoic Virtues
StoicLLM: Preference Optimization for Philosophical Alignment in Small Language Models

Ishmam Khan +2
cs.AI 2026-05-12 reviewed

Adaptive teacher exposure lifts LLM reasoning self-distillation
Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning

Zihao Han +3
cs.CR 2026-05-12 reviewed

One message turns LLM agents into DDoS amplifiers
Can a Single Message Paralyze the AI Infrastructure? The Rise of AbO-DDoS Attacks through Targeted Mobius Injection

Zi Liang +4
cs.CL 2026-05-12 reviewed

Verbalized belief claims raise LLM agent scores 14% in long tasks
Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty

Joykirat Singh +7

1 Piths
cs.CL 2026-05-12 reviewed

Training shallow layers beats full updates by freezing deep ones
Freeze Deep, Train Shallow: Interpretable Layer Allocation for Continued Pre-Training

Yu-Hang Wu +5
cs.CL 2026-05-12 reviewed

Freeze deep layers, train shallow for better LLM pre-training
Freeze Deep, Train Shallow: Interpretable Layer Allocation for Continued Pre-Training

Yu-Hang Wu +5
cs.LG 2026-05-12 reviewed

Masked pretraining yields 5% AUC gains for industrial tabular classification
MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification

Bo Zheng +6
cs.LG 2026-05-12 reviewed

Adaptive KL and Gaussian sampling raise AIME math scores by 13 points
fg-expo: Frontier-guided exploration-prioritized policy optimization via adaptive kl and gaussian curriculum

Mingxiong Lin +8
cs.AI 2026-05-12 reviewed

Models mismatch doctors on spread of medical urgency calls
AcuityBench: Evaluating Clinical Acuity Identification and Uncertainty Alignment

Robin Linzmayer (1 +30
cs.CL 2026-05-12 reviewed

Meta-reasoning builds custom scaffolds at inference time
Deep Reasoning in General Purpose Agents via Structured Meta-Cognition

Dean Light +9
cs.CL 2026-05-12 reviewed

EvalAgent raises first-run success to 65% for agent evaluations
An Empirical Study of Automating Agent Evaluation

Kang Zhou +16