archive
Every paper Pith has read. Search by title, abstract, or pith.
7661 papers in cs.CL · page 20
-
Lifelong normalization yields stable updates over many edits
More Edits, More Stable: Understanding the Lifelong Normalization in Sequential Model Editing
-
Expert swap and logit fix cut MoE perplexity 59% on noisy analog chips
ROMER: Expert Replacement and Router Calibration for Robust MoE LLMs on Analog Compute-in-Memory Systems
-
Reliable features enhance multiword expression classifications
Choosing features for classifying multiword expressions
-
Entropy polarity predicts whether updates expand or contract LLM policy entropy
Entropy Polarity in Reinforcement Fine-Tuning: Direction, Asymmetry, and Control
-
Token-level entropy polarity predicts update direction in LLM RL
Entropy Polarity in Reinforcement Fine-Tuning: Direction, Asymmetry, and Control
-
Token pair method cuts clinical LLM input by 31%
From Token to Token Pair: Efficient Prompt Compression for Large Language Models in Clinical Prediction
-
ATC language models reach only 0.69 on safety risk score
Safety-Oriented Evaluation of Language Understanding Systems for Air Traffic Control
-
Robots dream short futures to dodge manipulation failures
DreamAvoid: Critical-Phase Test-Time Dreaming to Avoid Failures in VLA Policies
-
Single max nonconformity score covers every pipeline stage at 1-alpha
PASC: Pipeline-Aware Conformal Prediction with Joint Coverage Guarantees for Multi-Stage NLP and LLM Pipelines
-
Consistent segments match full attention at long contexts
Training-Inference Consistent Segmented Execution for Long-Context LLMs
-
On-policy distillation triples speed via early update alignment
Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation
-
On-policy distillation locks in final model path early for 3x speedup
Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation
-
On-policy distillation gains 3x speedup by locking stable paths early
Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation
-
Critic and generator agents iteratively refine research outlines
AgentDisCo: Towards Disentanglement and Collaboration in Open-ended Deep Research Agents
-
Raw camera measurements cut vision-language errors
Allegory of the Cave: Measurement-Grounded Vision-Language Learning
-
More total MoE parameters improve quality at fixed active count
Slicing and Dicing: Configuring Optimal Mixtures of Experts
-
Minor representation components block LLM relearning attacks
Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter
-
Real Japanese middle-school exams benchmark AI with 900k student answers
Human-Grounded Multimodal Benchmark with 900K-Scale Aggregated Student Response Distributions from Japan's National Assessment of Academic Ability
-
Masked prefixes make small VLMs reason from images
Hide to See: Reasoning-prefix Masking for Visual-anchored Thinking in VLM Distillation
-
Masking reasoning prefixes anchors VLM thinking to visuals
Hide to See: Reasoning-prefix Masking for Visual-anchored Thinking in VLM Distillation
-
Masking prefixes anchors VLM thinking to images
Hide to See: Reasoning-prefix Masking for Visual-anchored Thinking in VLM Distillation
-
Macro boosts multilingual counterfactual validity by 12.55%
Macro: Enhancing Multilingual Counterfactual Explanations through Alignment-as-Preference Optimization
-
Distilled 4B model matches 8B baseline on multimodal reasoning
OmniThoughtVis: A Scalable Distillation Pipeline for Deployable Multimodal Reasoning Models
-
Emotional style triggers LLM backdoors at 99% success
When Emotion Becomes Trigger: Emotion-style dynamic Backdoor Attack Parasitising Large Language Models
-
Reversing self-distillation cuts math reasoning training steps 2-10x
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information
-
PRISM bound splits LLM drift into scale
PRISM: A Geometric Risk Bound that Decomposes Drift into Scale, Shape, and Head
-
Diffusion scoring evaluates text without left-to-right bias
DiffScore: Text Evaluation Beyond Autoregressive Likelihood
-
Framework speeds LLM advertising with acceptable quality trade-off
Efficient LLM-based Advertising via Model Compression and Parallel Verification
-
Compile-time DAG search boosts MegaKernel throughput for LLMs
Ada-MK: Adaptive MegaKernel Optimization via Automated DAG-based Search for LLM Inference
-
Bitwise diffusion generates multiple tokens per block in language models
BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion
-
Three regimes govern LLM responses to conflicting documents and training knowledge
Three Regimes of Context-Parametric Conflict: A Predictive Framework and Empirical Validation
-
Covariance-weighted GRPO tames extreme tokens in LLM training
Taming Extreme Tokens: Covariance-Aware GRPO with Gaussian-Kernel Advantage Reweighting
-
2000-report dataset tests AI on patient action cards from check-ups
Checkup2Action: A Multimodal Clinical Check-up Report Dataset for Patient-Oriented Action Card Generation
-
Dataset benchmarks AI on safe action cards from check-up reports
Checkup2Action: A Multimodal Clinical Check-up Report Dataset for Patient-Oriented Action Card Generation
-
Trajectory labels bias simulators and explode variance under policy change
Controllable User Simulation
-
Agents learn effective LLM configs from cheap trials
AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration - Learning from Cheap, Optimizing Expensive
-
Agents learn from cheap LLM trials to guide expensive configurations
AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration - Learning from Cheap, Optimizing Expensive
-
Hidden layers yield perplexity gains over logits in LLM pre-training
A Study on Hidden Layer Distillation for Large Language Model Pre-Training
-
Controlled semantic perturbations combined with selective training let biomedical…
Robust Biomedical Publication Type and Study Design Classification with Knowledge-Guided Perturbations
-
300 Examples Align Small LLMs to Stoic Virtues
StoicLLM: Preference Optimization for Philosophical Alignment in Small Language Models
-
Adaptive teacher exposure lifts LLM reasoning self-distillation
Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning
-
One message turns LLM agents into DDoS amplifiers
Can a Single Message Paralyze the AI Infrastructure? The Rise of AbO-DDoS Attacks through Targeted Mobius Injection
-
Verbalized belief claims raise LLM agent scores 14% in long tasks
Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty
1 Piths -
Training shallow layers beats full updates by freezing deep ones
Freeze Deep, Train Shallow: Interpretable Layer Allocation for Continued Pre-Training
-
Freeze deep layers, train shallow for better LLM pre-training
Freeze Deep, Train Shallow: Interpretable Layer Allocation for Continued Pre-Training
-
Masked pretraining yields 5% AUC gains for industrial tabular classification
MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification
-
Adaptive KL and Gaussian sampling raise AIME math scores by 13 points
fg-expo: Frontier-guided exploration-prioritized policy optimization via adaptive kl and gaussian curriculum
-
Models mismatch doctors on spread of medical urgency calls
AcuityBench: Evaluating Clinical Acuity Identification and Uncertainty Alignment
-
Meta-reasoning builds custom scaffolds at inference time
Deep Reasoning in General Purpose Agents via Structured Meta-Cognition
-
EvalAgent raises first-run success to 65% for agent evaluations
An Empirical Study of Automating Agent Evaluation