pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 20

  1. cs.LG 2026-05-12 reviewed
    Lifelong normalization yields stable updates over many edits

    More Edits, More Stable: Understanding the Lifelong Normalization in Sequential Model Editing

    Xin Ma +6

  2. cs.LG 2026-05-12 reviewed
    Expert swap and logit fix cut MoE perplexity 59% on noisy analog chips

    ROMER: Expert Replacement and Router Calibration for Robust MoE LLMs on Analog Compute-in-Memory Systems

    Wenyong Zhou +8

  3. cs.CL 2026-05-12 reviewed
    Reliable features enhance multiword expression classifications

    Choosing features for classifying multiword expressions

    Eric Laporte

  4. cs.LG 2026-05-12 reviewed
    Entropy polarity predicts whether updates expand or contract LLM policy entropy

    Entropy Polarity in Reinforcement Fine-Tuning: Direction, Asymmetry, and Control

    Jiazheng Zhang +19

  5. cs.LG 2026-05-12 reviewed
    Token-level entropy polarity predicts update direction in LLM RL

    Entropy Polarity in Reinforcement Fine-Tuning: Direction, Asymmetry, and Control

    Jiazheng Zhang +19

  6. cs.CL 2026-05-12 reviewed
    Token pair method cuts clinical LLM input by 31%

    From Token to Token Pair: Efficient Prompt Compression for Large Language Models in Clinical Prediction

    Mingcheng Zhu +3

  7. cs.CL 2026-05-12 reviewed
    ATC language models reach only 0.69 on safety risk score

    Safety-Oriented Evaluation of Language Understanding Systems for Air Traffic Control

    Yujing Chang +6

  8. cs.RO 2026-05-12 reviewed
    Robots dream short futures to dodge manipulation failures

    DreamAvoid: Critical-Phase Test-Time Dreaming to Avoid Failures in VLA Policies

    Xianzhe Fan +6

  9. cs.LG 2026-05-12 reviewed
    Single max nonconformity score covers every pipeline stage at 1-alpha

    PASC: Pipeline-Aware Conformal Prediction with Joint Coverage Guarantees for Multi-Stage NLP and LLM Pipelines

    Varun Kotte

  10. cs.CL 2026-05-12 reviewed
    Consistent segments match full attention at long contexts

    Training-Inference Consistent Segmented Execution for Long-Context LLMs

    Xianpeng Shang +4

  11. cs.CL 2026-05-12 reviewed
    On-policy distillation triples speed via early update alignment

    Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

    Yuchen Cai +11

  12. cs.CL 2026-05-12 reviewed
    On-policy distillation locks in final model path early for 3x speedup

    Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

    Yuchen Cai +11

  13. cs.CL 2026-05-12 reviewed
    On-policy distillation gains 3x speedup by locking stable paths early

    Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

    Yuchen Cai +11

  14. cs.IR 2026-05-12 reviewed
    Critic and generator agents iteratively refine research outlines

    AgentDisCo: Towards Disentanglement and Collaboration in Open-ended Deep Research Agents

    Jiarui Jin +4

  15. cs.AI 2026-05-12 reviewed
    Raw camera measurements cut vision-language errors

    Allegory of the Cave: Measurement-Grounded Vision-Language Learning

    Kepeng Xu +3

  16. cs.LG 2026-05-12 reviewed
    More total MoE parameters improve quality at fixed active count

    Slicing and Dicing: Configuring Optimal Mixtures of Experts

    Margaret Li +3

  17. cs.CL 2026-05-12 reviewed
    Minor representation components block LLM relearning attacks

    Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter

    Zeguan Xiao +6

  18. cs.CL 2026-05-12 reviewed
    Real Japanese middle-school exams benchmark AI with 900k student answers

    Human-Grounded Multimodal Benchmark with 900K-Scale Aggregated Student Response Distributions from Japan's National Assessment of Academic Ability

    Kyosuke Takami +3

  19. cs.CV 2026-05-12 reviewed
    Masked prefixes make small VLMs reason from images

    Hide to See: Reasoning-prefix Masking for Visual-anchored Thinking in VLM Distillation

    Seonghoon Yu +3

  20. cs.CV 2026-05-12 reviewed
    Masking reasoning prefixes anchors VLM thinking to visuals

    Hide to See: Reasoning-prefix Masking for Visual-anchored Thinking in VLM Distillation

    Seonghoon Yu +3

  21. cs.CV 2026-05-12 reviewed
    Masking prefixes anchors VLM thinking to images

    Hide to See: Reasoning-prefix Masking for Visual-anchored Thinking in VLM Distillation

    Seonghoon Yu +3

  22. cs.CL 2026-05-12 reviewed
    Macro boosts multilingual counterfactual validity by 12.55%

    Macro: Enhancing Multilingual Counterfactual Explanations through Alignment-as-Preference Optimization

    Yilong Wang +5

  23. cs.CL 2026-05-12 reviewed
    Distilled 4B model matches 8B baseline on multimodal reasoning

    OmniThoughtVis: A Scalable Distillation Pipeline for Deployable Multimodal Reasoning Models

    Yuanhao Yue +4

  24. cs.CL 2026-05-12 reviewed
    Emotional style triggers LLM backdoors at 99% success

    When Emotion Becomes Trigger: Emotion-style dynamic Backdoor Attack Parasitising Large Language Models

    Ziyu Liu +7

  25. cs.LG 2026-05-12 reviewed
    Reversing self-distillation cuts math reasoning training steps 2-10x

    Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

    Guobin Shen +6

  26. cs.CL 2026-05-12 reviewed
    PRISM bound splits LLM drift into scale

    PRISM: A Geometric Risk Bound that Decomposes Drift into Scale, Shape, and Head

    Chieh-Yen Lin +1

  27. cs.CL 2026-05-12 reviewed
    Diffusion scoring evaluates text without left-to-right bias

    DiffScore: Text Evaluation Beyond Autoregressive Likelihood

    Wen Lai +6

  28. cs.CL 2026-05-12 reviewed
    Framework speeds LLM advertising with acceptable quality trade-off

    Efficient LLM-based Advertising via Model Compression and Parallel Verification

    Wenxin Dong +11

  29. cs.CL 2026-05-12 reviewed
    Compile-time DAG search boosts MegaKernel throughput for LLMs

    Ada-MK: Adaptive MegaKernel Optimization via Automated DAG-based Search for LLM Inference

    Wenxin Dong +9

  30. cs.CL 2026-05-12 reviewed
    Bitwise diffusion generates multiple tokens per block in language models

    BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion

    Shaobin Zhuang +9

  31. cs.CL 2026-05-12 reviewed
    Three regimes govern LLM responses to conflicting documents and training knowledge

    Three Regimes of Context-Parametric Conflict: A Predictive Framework and Empirical Validation

    Pruthvinath Jeripity Venkata

  32. cs.CL 2026-05-12 reviewed
    Covariance-weighted GRPO tames extreme tokens in LLM training

    Taming Extreme Tokens: Covariance-Aware GRPO with Gaussian-Kernel Advantage Reweighting

    Cheng Wang +3

  33. cs.CL 2026-05-12 reviewed
    2000-report dataset tests AI on patient action cards from check-ups

    Checkup2Action: A Multimodal Clinical Check-up Report Dataset for Patient-Oriented Action Card Generation

    Sike Xiang +6

  34. cs.CL 2026-05-12 reviewed
    Dataset benchmarks AI on safe action cards from check-up reports

    Checkup2Action: A Multimodal Clinical Check-up Report Dataset for Patient-Oriented Action Card Generation

    Sike Xiang +6

  35. cs.AI 2026-05-12 reviewed
    Trajectory labels bias simulators and explode variance under policy change

    Controllable User Simulation

    Guy Tennenholtz +5

  36. cs.AI 2026-05-12 reviewed
    Agents learn effective LLM configs from cheap trials

    AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration - Learning from Cheap, Optimizing Expensive

    Taicheng Guo +3

  37. cs.AI 2026-05-12 reviewed
    Agents learn from cheap LLM trials to guide expensive configurations

    AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration - Learning from Cheap, Optimizing Expensive

    Taicheng Guo +3

  38. cs.CL 2026-05-12 reviewed
    Hidden layers yield perplexity gains over logits in LLM pre-training

    A Study on Hidden Layer Distillation for Large Language Model Pre-Training

    Maxime Guigon +2

  39. cs.CL 2026-05-12 reviewed
    Controlled semantic perturbations combined with selective training let biomedical…

    Robust Biomedical Publication Type and Study Design Classification with Knowledge-Guided Perturbations

    Shufan Ming +3

  40. cs.CL 2026-05-12 reviewed
    300 Examples Align Small LLMs to Stoic Virtues

    StoicLLM: Preference Optimization for Philosophical Alignment in Small Language Models

    Ishmam Khan +2

  41. cs.AI 2026-05-12 reviewed
    Adaptive teacher exposure lifts LLM reasoning self-distillation

    Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning

    Zihao Han +3

  42. cs.CR 2026-05-12 reviewed
    One message turns LLM agents into DDoS amplifiers

    Can a Single Message Paralyze the AI Infrastructure? The Rise of AbO-DDoS Attacks through Targeted Mobius Injection

    Zi Liang +4

  43. cs.CL 2026-05-12 reviewed
    Verbalized belief claims raise LLM agent scores 14% in long tasks

    Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty

    Joykirat Singh +7

    1 Piths
  44. cs.CL 2026-05-12 reviewed
    Training shallow layers beats full updates by freezing deep ones

    Freeze Deep, Train Shallow: Interpretable Layer Allocation for Continued Pre-Training

    Yu-Hang Wu +5

  45. cs.CL 2026-05-12 reviewed
    Freeze deep layers, train shallow for better LLM pre-training

    Freeze Deep, Train Shallow: Interpretable Layer Allocation for Continued Pre-Training

    Yu-Hang Wu +5

  46. cs.LG 2026-05-12 reviewed
    Masked pretraining yields 5% AUC gains for industrial tabular classification

    MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification

    Bo Zheng +6

  47. cs.LG 2026-05-12 reviewed
    Adaptive KL and Gaussian sampling raise AIME math scores by 13 points

    fg-expo: Frontier-guided exploration-prioritized policy optimization via adaptive kl and gaussian curriculum

    Mingxiong Lin +8

  48. cs.AI 2026-05-12 reviewed
    Models mismatch doctors on spread of medical urgency calls

    AcuityBench: Evaluating Clinical Acuity Identification and Uncertainty Alignment

    Robin Linzmayer (1 +30

  49. cs.CL 2026-05-12 reviewed
    Meta-reasoning builds custom scaffolds at inference time

    Deep Reasoning in General Purpose Agents via Structured Meta-Cognition

    Dean Light +9

  50. cs.CL 2026-05-12 reviewed
    EvalAgent raises first-run success to 65% for agent evaluations

    An Empirical Study of Automating Agent Evaluation

    Kang Zhou +16