archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 19

cs.CL 2026-05-12 reviewed

Re-testing lowers most controlled text generation scores
A Comparative Study of Controlled Text Generation Systems Using Level-Playing-Field Evaluation Principles

Michela Lorandi +1
cs.CL 2026-05-12 reviewed

Small detector beats large models at spotting LLM hallucinations
Scalable Token-Level Hallucination Detection in Large Language Models

Rui Min +4
cs.CL 2026-05-12 reviewed

Pretraining exposure predicts LLM popularity better than Wikipedia
Pretraining Exposure Explains Popularity Judgments in Large Language Models

Jamshid Mozafari +2
cs.CL 2026-05-12 reviewed

High-convergence sentences lift LLM accuracy on inferential questions
Context Convergence Improves Answering Inferential Questions

Jamshid Mozafari +2
cs.CL 2026-05-12 reviewed

Benchmark forces models to combine facts from two articles
MedHopQA: A Disease-Centered Multi-Hop Reasoning Benchmark and Evaluation Framework for LLM-Based Biomedical Question Answering

Rezarta Islamaj +15
cs.CL 2026-05-12 reviewed

Summing PEFT module outputs boosts multi-attribute text control
Output Composability of QLoRA PEFT Modules for Plug-and-Play Attribute-Controlled Text Generation

Michela Lorandi +1
cs.CL 2026-05-12 reviewed

Index ranks category pairs by confusion risk in data entry
A categorical error sensitivity index (ISEC): A preventive ordinal decision-support measure for irrecoverable errors in manual data entry systems

Ricardo Ra\'ul Palma +2
cs.CL 2026-05-12 reviewed

Retrieval lifts two-hop medical QA to 89% conceptual accuracy
Overview of the MedHopQA track at BioCreative IX: track description, participation and evaluation of systems for multi-hop medical question answering

Rezarta Islamaj +15
cs.CL 2026-05-12 reviewed

Gender bias and facts share the same neurons in language models
GKnow: Measuring the Entanglement of Gender Bias and Factual Gender

Leonor Veloso +1
cs.CL 2026-05-12 reviewed

Token-level ratio matching generalizes DPO for precise alignment
TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching

Truong Nguyen +5
cs.CL 2026-05-12 reviewed

Token-level ratio matching aligns models at each generation step
TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching

Truong Nguyen +5
cs.CL 2026-05-12 reviewed

Familiarity dominates English word difficulty across three L1 groups
What makes a word hard to learn? Modeling L1 influence on English vocabulary difficulty

Jonas Mayer Martins +3
cs.CR 2026-05-12 reviewed

New decoder recovers personal data from finetuned models
Reconstruction of Personally Identifiable Information from Supervised Finetuned Models

Sae Furukawa +1
cs.CL 2026-05-12 reviewed

PRISM cuts context use by 10x while lifting accuracy on long agent tasks
PRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agents

Jingyi Peng +3
cs.CL 2026-05-12 reviewed

PRISM hits higher accuracy with 10x less context in long-horizon agent memory
PRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agents

Jingyi Peng +3
cs.CL 2026-05-12 reviewed

Benchmark finds LLMs miss how scams escalate turn by turn
PreScam: A Benchmark for Predicting Scam Progression from Early Conversations

Weixiang Sun +7
cs.CL 2026-05-12 reviewed

Token marks plus contrastive tuning clean disfluent speech transcripts
Mind the Pause: Disfluency-Aware Objective Tuning for Multilingual Speech Correction with LLMs

Deepak Kumar +2
cs.CL 2026-05-12 reviewed

Combined optimization and distillation boosts long-context LLM reasoning
Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models

Miguel Moura Ramos +2
cs.CL 2026-05-12 reviewed

Sparse autoencoders expose features inside Whisper ASR
Mechanistic Interpretability of ASR models using Sparse Autoencoders

Dan Pluth +3
cs.LG 2026-05-12 reviewed

LoRA accuracy depends on which parameters are trained
Not How Many, But Which: Parameter Placement in Low-Rank Adaptation

Arijit Sehanobish +1
cs.CL 2026-05-12 reviewed

LLM decoding routes around memory clashes via attention checks
Mitigating Context-Memory Conflicts in LLMs through Dynamic Cognitive Reconciliation Decoding

Yigeng Zhou +8
cs.AI 2026-05-12 reviewed

Discovery Agents Beat Learned Models Under Enterprise Shifts
Do Enterprise Systems Need Learned World Models? The Importance of Context to Infer Dynamics

Jishnu Sethumadhavan Nair +16
cs.CL 2026-05-12 reviewed

Bayesian priors fix up to 50-point errors in LLM user feedback
Correcting Selection Bias in Sparse User Feedback for Large Language Model Quality Estimation: A Multi-Agent Hierarchical Bayesian Approach

Andrea Morandi +1
cs.CL 2026-05-12 reviewed

Reconstructing missing facts boosts misinformation detection
Latent Causal Void: Explicit Missing-Context Reconstruction for Misinformation Detection

Hui Li +3
cs.CV 2026-05-12 reviewed

One autoregressive model makes personalized ad images and text
Design Your Ad: Personalized Advertising Image and Text Generation with Unified Autoregressive Models

Yexing Xu +17
cs.CL 2026-05-12 reviewed

Poetic prompts create separate processing paths that evade LLM safety
Metaphor Is Not All Attention Needs

Olga Sorokoletova +8
cs.CL 2026-05-12 reviewed

Data focus and signer adaptation unlock low-resource sign language AI
Sign Language Recognition and Translation for Low-Resource Languages: Challenges and Pathways Forward

Nigar Alishzade +1
cs.RO 2026-05-12 reviewed

World models merge with action generation for embodied AI
World Action Models: The Next Frontier in Embodied AI

Siyin Wang +13
cs.CL 2026-05-12 reviewed

LLMs show limited evidence of grammar violation detectors
Do Language Models Encode Knowledge of Linguistic Constraint Violations?

Hardy +1
cs.CL 2026-05-12 reviewed

LLMs show limited internal grammar violation detectors
Do Language Models Encode Knowledge of Linguistic Constraint Violations?

Hardy +1
cs.CL 2026-05-12 reviewed

Spoken input aids verb learning over child-directed speech
Is Child-Directed Language Optimized for Word Learning? A Computational Study of Verb Meaning Acquisition

Francesca Padovani +3
cs.CL 2026-05-12 reviewed

Skill graphs boost agent RL on complex tasks
SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs

Xiaoyuan Li +6
cs.CL 2026-05-12 reviewed

Three-stage retrieval pipeline ranks 8th in SemEval multi-turn task
Caraman at SemEval-2026 Task 8: Three-Stage Multi-Turn Retrieval with Query Rewriting, Hybrid Search, and Cross-Encoder Reranking

David-Maximilian Caraman +1
cs.CL 2026-05-12 reviewed

SAGE proposes a framework that trains smaller models to automatically generate and verify…
SAGE: Scalable Automated Robustness Augmentation for LLM Knowledge Evaluation

Xiaoyuan Li +8
cs.CR 2026-05-12 reviewed

Benchmark finds skills expose agents to unsafe attacks
SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces

Chang Jin +9
cs.CL 2026-05-12 reviewed

Human actions guide LLM agents past RL barriers
Learning Agentic Policy from Action Guidance

Yuxiang Ji +8
cs.CL 2026-05-12 reviewed

Selective visuals raise Indic subtitle translation scores
Towards Visually-Guided Movie Subtitle Translation for Indic Languages

Tarun Chintada +2
cs.CL 2026-05-12 reviewed

Rubric test predicts LLM post-training success at 90% accuracy
On Predicting the Post-training Potential of Pre-trained LLMs

Xiaoyuan Li +7
cs.CL 2026-05-12 reviewed

Scenario modeling plus intent bridging lifts target-guided dialogues
Enhancing Target-Guided Proactive Dialogue Systems via Conversational Scenario Modeling and Intent-Keyword Bridging

Maodong Li +2
cs.CV 2026-05-12 reviewed

Frozen CLIP features top ResNet for instructional video summaries
Multimodal Abstractive Summarization of Instructional Videos with Vision-Language Models

Maham Nazir +3
cs.SE 2026-05-12 reviewed

Print statements teach code models to reason step by step
StepCodeReasoner: Aligning Code Reasoning with Stepwise Execution Traces via Reinforcement Learning

Hao Wang +3
cs.CL 2026-05-12 reviewed

Neuron activation margins augment preference optimization for math
YFPO: A Preliminary Study of Yoked Feature Preference Optimization with Neuron-Guided Rewards for Mathematical Reasoning

Yifan Le
cs.CL 2026-05-12 reviewed

Sparse autoencoders become steering and optimization tools for LLMs
Qwen-Scope: Turning Sparse Features into Development Tools for Large Language Models

Boyi Deng +17
cs.CL 2026-05-12 reviewed

Concordance tool assembles local grammars for better name extraction
Concordance Comparison as a Means of Assembling Local Grammars

Juliana Pirovani +2
cs.CV 2026-05-12 reviewed

Unified visual latents cut reasoning tokens in multimodal models
UniVLR: Unifying Text and Vision in Visual Latent Reasoning for Multimodal LLMs

Houcheng Jiang +6
cs.CL 2026-05-12 reviewed

Boltzmann ranking on trajectories lifts diffusion language model performance
Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models

Kecheng Chen +11
cs.CL 2026-05-12 reviewed

Boltzmann ranking of inference trajectories improves DLM post-training
Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models

Kecheng Chen +11
cs.LG 2026-05-12 reviewed

Divergence signals adapt credit assignment for LLM agent RL
GEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillation

Sijia Li +9
cs.LG 2026-05-12 reviewed

Divergence spikes adapt credit assignment for LLM agents
GEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillation

Sijia Li +9
cs.CL 2026-05-12 reviewed

Fine-tuning teaches models to control randomness
Probabilistic Calibration Is a Trainable Capability in Language Models

Davide Baldelli +4