archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 2

cs.LG 2026-05-21 reviewed

Controller routes LLM requests to best mode for 2x speedup
ModeSwitch-LLM: A Lightweight Phase-Aware Controller for Cross-Mode LLM Inference on a Single GPU

Aman Sunesh +2
cs.LG 2026-05-21 reviewed

Recognition of evaluations depends on model-benchmark pairs
Decomposing and Measuring Evaluation Awareness

Changling Li +5
cs.CL 2026-05-21 reviewed

Compositionality rises then falls in LLM self-training
Model Collapse as Cultural Evolution

Dongxin Guo +2
cs.CL 2026-05-21 reviewed

RAG method leads in mental health improvement detection
DreamerNLplus: Interpretable Modeling of Mental Health Dynamics from Social Media Timelines using Hybrid Rule-Based and RAG Methods

Maryia Zhyrko +3
cs.CL 2026-05-21 reviewed

Hawkes process lifts late alignment in news text simulations
HawkesLLM: Semantic Uncertainty Propagation in Agentic Text Simulation

Zewei Deng +2
cs.CL 2026-05-21 reviewed

LLMs learn what not to say via frequency competition
Do Language Models Know What Not to Say? Causal Evidence for Statistical Preemption in LLMs

Dongxin Guo +2
cs.CL 2026-05-21 reviewed

Multilingual SAEs enable reliable language steering without layer search
Multilingual Steering by Design: Multilingual Sparse Autoencoders and Principled Layer Selection

Yusser Al Ghussin +5
cs.CL 2026-05-21 reviewed

SAE features from LLMs map onto brain semantic regions
Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography

Dongxin Guo +2
cs.CL 2026-05-21 reviewed

Training data language, not English, drives brain-LLM alignment
Brain-LLM Alignment Tracks Training Data, Not Typology

Dongxin Guo +2
cs.LG 2026-05-21 reviewed

RADAR forecasts transfer by comparing representation trajectories
RADAR: Relative Angular Divergence Across Representations

Xavier Cadet +2
cs.AI 2026-05-21 reviewed

Transformers have fixed accuracy limits set by layers and width
The Deterministic Horizon: Impossibility Results as Design Specifications for Trustworthy AI Systems

Dongxin Guo
cs.CL 2026-05-21 reviewed

Proactive AI questions uncover 82% of autism language traits
A Proactive Multi-Agent Dialogue Framework for Assessing Social Language Disorder Traits in Autism

Chuanbo Hu +6
cs.CL 2026-05-21 reviewed

FIM pretraining yields linear verbatim memorization growth
Memorization Dynamics of Fill-in-the-Middle Pretraining

Tobias von Arx +1
cs.CL 2026-05-21 reviewed

Pipeline creates first UD treebank for Katharevousa Greek
A Reproducible Universal Dependencies-Style Pipeline for Katharevousa Greek Parliamentary Text

George Mikros +1
cs.CL 2026-05-21 reviewed

AI models favor some religions over others in conversion advice
When AI Takes Sides on Questions of Faith: Persistent Asymmetries in AI-Mediated Faith Guidance

Brett Israelsen +5
cs.CL 2026-05-21 reviewed

LLMs estimate expertise from Slack logs with 21% error
Can AI Guess What You Know? Performance Comparison of Large Language Models for Human Domain Knowledge Estimation From Communication Logs

Ko Watanabe +1
cs.CL 2026-05-21 reviewed

Graph alignment detects LLM hallucinations better than GPT-4o
Graph Alignment Topology as an Inductive Bias for Grounding Detection

Paul Landes +3
cs.CL 2026-05-21 reviewed

LIFT gives diffusion models up to 3x reasoning gains on math tests
Learnability-Informed Fine-Tuning of Diffusion Language Models

Shubham Parashar +7
cs.CL 2026-05-21 reviewed

Error feedback in prompts halves Cypher query execution errors
RAS: Reflection-Augmented Scaling with In-Context Learning for Executable Cypher Query Generation

Minseok Jung +2
cs.IR 2026-05-21 reviewed

LaTeX source yields better RAG chunks than PDF text
AI-Friendly LaTeX: Using LaTeX Code as a Knowledge Source for Retrieval-Augmented Generation

Tom Verhoeff
cs.CL 2026-05-21 reviewed

Linear program yields tokenizers within 1% of optimal
Tokenisation via Convex Relaxations

Jan Tempus +4
cs.LG 2026-05-21 reviewed

Vector rewards produce diverse LLM outputs that raise search scores
Vector Policy Optimization: Training for Diversity Improves Test-Time Search

Ryan Bahlous-Boldi +8
cs.AI 2026-05-21 reviewed

Evidence verifier scores spans by accuracy gain in self-evolving agents
EVE-Agent: Evidence-Verifiable Self-Evolving Agents

Yamato Arai +1
cs.CL 2026-05-21 reviewed

AI chatbots hit 90 percent on fresh news but drop in open answers
Evaluating Commercial AI Chatbots as News Intermediaries

Mirac Suzgun +7
cs.CV 2026-05-21 reviewed

VLMs keep high scores after most image tokens are deleted
Seeing without Looking: Do Vision-Language Benchmarks Really Test Vision?

Zixuan Lan +3
cs.LG 2026-05-21 reviewed

Transcoders trace VLM grounding and predict hallucinations at 0.68 AUC
Transcoders Trace Visual Grounding and Hallucinations in Vision-Language Models

Dimitrios Damianos +4
cs.CL 2026-05-21 reviewed

Consistency training cuts covert political bias in LLMs
Reducing Political Manipulation with Consistency Training

Long Phan +5
cs.CL 2026-05-21 reviewed

Time-ordered training keeps LLM facts fresher than shuffling
Understanding Data Temporality Impact on Large Language Models Pre-training

Hippolyte Pilchen +4
cs.CL 2026-05-21 reviewed

Temporal biomedical graph rescues up to 65% of LLM errors on disease timelines
ChronoMedKG: A Temporally-Grounded Biomedical Knowledge Graph and Benchmark for Clinical Reasoning

Md Shamim Ahmed +4
cs.AI 2026-05-21 reviewed

LLM analysis outperforms acoustics for political pathos
Beyond Acoustic Emotion Recognition: Multimodal Pathos Analysis in Political Speech Using LLM-Based and Acoustic Emotion Models

Juergen Dietrich
cs.CV 2026-05-21 reviewed

Simulated dense placements train IMU model that ignores sensor setup
AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild

Baiyu Chen +7
cs.AI 2026-05-21 reviewed

Conversation history pulls LLM judgments toward its tone
AMEL: Accumulated Message Effects on LLM Judgments

Sid-ali Temkit
cs.CL 2026-05-21 reviewed

ToaST cuts tokens over 11% vs BPE at large vocabularies
Tokenization with Split Trees

Craig W. Schmidt +6
cs.CL 2026-05-21 reviewed

Gradient subspace projection boosts LLM self-distillation
Self-Policy Distillation via Capability-Selective Subspace Projection

Guangya Hao +4
cs.CL 2026-05-21 reviewed

Moral cues survive machine translation to Polish
Moral Semantics Survive Machine Translation: Cross-Lingual Evidence from Moral Foundations Corpora

Maciej Skorski
cs.CL 2026-05-21 reviewed

Images boost LLM poetry detectors past RoBERTa
Seeing the Poem: Image-Semantic Detection of AI-Generated Modern Chinese Poetry with MLLMs

Shanshan Wang +8
cs.CL 2026-05-21 reviewed

AI Action Plan echoes private sector over public life concerns
Whose Voice Counts? Mapping Stakeholder Perspectives on AI Through Public Submissions to the U.S. Government

Alina Karakanta +6
cs.CL 2026-05-21 reviewed

AI office agents fail 44% of gradual attack tests
Boiling the Frog: A Multi-Turn Benchmark for Agentic Safety

Piercosma Bisconti +13
cs.CL 2026-05-21 reviewed

Benchmark shows AI agents accept gradual risks in 44 percent of cases
Boiling the Frog: A Multi-Turn Benchmark for Agentic Safety

Piercosma Bisconti +13
cs.CL 2026-05-21 reviewed

Moral knowledge beats extra context and model scaling for value detection
More Context, Larger Models, or Moral Knowledge? A Systematic Study of Schwartz Value Detection in Political Texts

V\'ictor Yeste +1
cs.CL 2026-05-21 reviewed

Moral knowledge retrieval beats extra context for political value detection
More Context, Larger Models, or Moral Knowledge? A Systematic Study of Schwartz Value Detection in Political Texts

V\'ictor Yeste +1
cs.LG 2026-05-21 reviewed

CAME-Grad fixes gradient double dilemma in report generation
The Double Dilemma in Multi-Task Radiology Report Generation: A Gradient Dynamics Analysis and Solution

Erjian Zhang +3
cs.LG 2026-05-21 reviewed

CAME-Grad optimizer lifts radiology reports by 2 percent
The Double Dilemma in Multi-Task Radiology Report Generation: A Gradient Dynamics Analysis and Solution

Erjian Zhang +3
cs.LG 2026-05-21 reviewed

Dual rewards stabilize unsupervised LLM reasoning
Two is better than one: A Collapse-free Multi-Reward RLIF Training Framework

Shourov Joarder +4
cs.CL 2026-05-21 reviewed

Sensorimotor ratings speed Chinese word recognition
Chinese sensorimotor and embodiment norms for 3,000 lexicalized concepts

Jing Chen +4
cs.CL 2026-05-21 reviewed

Agentic CLEAR automates multi-level LLM agent evaluation
Agentic CLEAR: Automating Multi-Level Evaluation of LLM Agents

Asaf Yehudai +2
cs.LG 2026-05-21 reviewed

Noise prediction loss matches score matching up to constant
A Tutorial on Diffusion Theory: From Differential Equations to Diffusion Models

Jiayi Fu +1
cs.CL 2026-05-21 reviewed

Hyperfitting expands final LLM layer to promote rare tokens
Beyond Temperature: Hyperfitting as a Late-Stage Geometric Expansion

Meimingwei Li +3
cs.CL 2026-05-21 reviewed

Decaying hints lift non-English reasoning without drift
LANG: Reinforcement Learning for Multilingual Reasoning with Language-Adaptive Hint Guidance

Yuchun Fan +11
cs.CL 2026-05-21 reviewed

Multiple metrics required to judge synthetic data for tool-calling agents
SynAE: A Framework for Measuring the Quality of Synthetic Data for Tool-Calling Agent Evaluations

Shuaiqi Wang +3