archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 12

cs.LG 2026-05-16 reviewed

Hard examples stay unlearnable in RLVR despite correct rollouts
The Unlearnability Phenomenon in RLVR for Language Models

Yulin Chen +2
cs.LG 2026-05-16 reviewed

LLMs forget targeted knowledge via neutral remaps and closed-form edits
ZeroUnlearn: Few-Shot Knowledge Unlearning in Large Language Models

Yujie Lin +4
cs.LG 2026-05-16 reviewed

Closed-form update unlearns sensitive LLM knowledge in few shots
ZeroUnlearn: Few-Shot Knowledge Unlearning in Large Language Models

Yujie Lin +4
cs.CL 2026-05-16 reviewed

Lightweight LLMs match DNNs on court view generation
Exploring Lightweight Large Language Models for Court View Generation

Zhitian Hou +4
cs.CL 2026-05-16 reviewed

Retrieval assigns legal labels without hallucinations or retraining
Retrieval-Based Multi-Label Legal Annotation: Extensible, Data-Efficient and Hallucination-Free

Li Zhang +2
cs.CL 2026-05-16 reviewed

500-step pre-training on structured language matches LLM efficiency and adds human-like抵抗
Language Acquisition Device in Large Language Models

Masato Mita +3
cs.LG 2026-05-16 reviewed

fMRI decodes continuous affect for individualized caption rewriting
EmoMind: Decoding Affective Captions from Human Brain fMRI

Bilal A. Mohammed +2
cs.AI 2026-05-15 reviewed

Bounding commitments cuts personalization failures to zero
Recall Isn't Enough: Bounding Commitments in Personalized Language Systems

Rui Tang +6
cs.CL 2026-05-15 reviewed

AI Agents Complete Only 28% of Healthcare Workflow Tasks
CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

Haolin Chen +32
cs.CL 2026-05-15 reviewed

RoBERTa spots manner and result verbs at 89.6% accuracy
A Scalable Tool for Measuring Manner and Result Verbs in Developmental Language Research

Divyesh Pratap Singh +6
cs.CL 2026-05-15 reviewed

Graphs track dialogue state for better consistency checks
SKG-Eval: Stateful Evaluation of Multi-Turn Dialogue via Incremental Semantic Knowledge Graphs

Avijit Shil +1
cs.CL 2026-05-15 reviewed

Generative models output continuous emotion intensity scores
Beyond Sentiment Classification: A Generative Framework for Emotion Intensity Evaluation in Text

Francesco A. Fabozzi +2
cs.LG 2026-05-15 reviewed

Standard embeddings match MRL after truncation except at 80% cuts
To MRL or not to MRL: Text Embeddings are Robust to Truncation Without Matryoshka Learning, Except In Heavy Truncation Scenarios

Sotaro Takeshita +3
cs.LG 2026-05-15 reviewed

Alignment updates concentrate in transformer read pathways
Where Pretraining writes and Alignment reads: the asymmetry of Transformer weight space

Valeria Ruscio +2
cs.CL 2026-05-15 reviewed

HTML versions advance math paper accessibility with MathML 4
Scaling Accessible Mathematics on arXiv: HTML Conversion and MathML 4

Deyan Ginev +4
cs.CL 2026-05-15 reviewed

PQR finds 23-78% more QA agent failures with realistic queries
PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failures

Yunan Lu +4
cs.LG 2026-05-15 reviewed

Symphony outperforms on medical speech recognition
Symphony for Speech-to-Text: Supporting Real-Time Medical Voice Interfaces

Arne Nix +8
cs.LG 2026-05-15 reviewed

Medical speech system outperforms current leaders on clinical tasks
Symphony for Speech-to-Text: Supporting Real-Time Medical Voice Interfaces

Arne Nix +8
cs.HC 2026-05-15 reviewed

LLMs require critical parameter choices for qualitative work
LLMs in Qualitative Research: Opportunities, Limitations, and Practical Considerations

Henry Salgado +3
cs.HC 2026-05-15 reviewed

LLM outputs drift toward past context in extended chats
Alignment Drift in Long-Term Human-LLM Interaction: A Mechanism-Oriented Framework

Xintong Yao
cs.CL 2026-05-15 reviewed

Decay slope couples routing loss and execution rescue in LLM agent libraries
The Scaling Laws of Skills in LLM Agent Systems

Charles Chen +14
cs.CL 2026-05-15 reviewed

One framework turns utility numbers into readable bills with carbon totals
A Generative AI Framework for Intelligent Utility Billing CO 2 Analytics and Sustainable Resource Optimisation

Pavan Manjunath +1
cs.CY 2026-05-15 reviewed

AI edits can steer collective opinions across networks
AI-Mediated Communication Can Steer Collective Opinion

Stratis Tsirtsis +4
cs.LG 2026-05-15 reviewed

Swap test choice changes safe layers for transformer pruning
No Free Swap: Protocol-Dependent Layer Redundancy in Transformers

Gabriel Garcia
cs.AI 2026-05-15 reviewed

Population broadcast lifts LLM agent returns up to 7.7x without weight updates
FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast

Igor Bogdanov +5
cs.CL 2026-05-15 reviewed

AI framework unifies gas distribution
A Unified Generative-AI Framework for Smart Energy Infrastructure: Intelligent Gas Distribution, Utility Billing, Carbon Analytics, and Quantum-Inspired Optimisation

Pavan Manjunath +1
cs.CL 2026-05-15 reviewed

Lesioned models produce aphasia symptoms unlike human cases
Artificial Aphasias in Lesioned Language Models

Nathan Roll +3
cs.CL 2026-05-15 reviewed

Evidence graph dispatches parallel searchers to reach 86.2 on BrowseComp
Argus: Evidence Assembly for Scalable Deep Research Agents

Zhen Zhang +9
cs.CL 2026-05-15 reviewed

Navigator assembles research from complementary evidence pieces
Argus: Evidence Assembly for Scalable Deep Research Agents

Zhen Zhang +9
cs.AI 2026-05-15 reviewed

Open pipeline lifts clinical LLMs to new benchmark highs
Fully Open Meditron: An Auditable Pipeline for Clinical LLMs

Xavier Theimer-Lienhard +7
cs.AI 2026-05-15 reviewed

LLM tutors spot optimal steps but over-accept wrong solutions
Confirming Correct, Missing the Rest: LLM Tutoring Agents Struggle Where Feedback Matters Most

Tahreem Yasir +5
cs.AI 2026-05-15 reviewed

State abstraction yields 76% higher returns per token for LLM agents
Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP

Igor Bogdanov +5
cs.CL 2026-05-15 reviewed

Value profiles from surveys cut LLM cross-country errors
Improving Cross-Cultural Survey Simulation with Calibrated Value Personas

Axel Abels +3
cs.CL 2026-05-15 reviewed

AI tree search finds better 3D solar panel shapes
Optimized Three-Dimensional Photovoltaic Structures with LLM guided Tree Search

Michael P. Brenner +2
cs.AI 2026-05-15 reviewed

Exploration-first training improves LLM agents in new environments
Look Before You Leap: Autonomous Exploration for LLM Agents

Ziang Ye +8
cs.CL 2026-05-15 reviewed

External subgraphs guide LLMs to sharper multi-step answers
SGR: A Stepwise Reasoning Framework for LLMs with External Subgraph Generation

Xin Zhang +4
cs.CL 2026-05-15 reviewed

Retrieval reverses bias to make LLMs fairer without tuning
DebiasRAG: A Tuning-Free Path to Fair Generation in Large Language Models through Retrieval-Augmented Generation

Rui Chu +8
cs.CL 2026-05-15 reviewed

Token relations fix bias in machine text detection
Multi-Level Contextual Token Relation Modeling for Machine-Generated Text Detection

Chenwang Wu +4
cs.DL 2026-05-15 reviewed

Generative AI supports literature reviews through summarization and queries
Generative Artificial Intelligence for Literature Reviews

Gerit Wagner +4
cs.CL 2026-05-15 reviewed

LLM similarity selection lowers error for low cognitive scores
Can Large Language Models Imitate Human Speech for Clinical Assessment? LLM-Driven Data Augmentation for Cognitive Score Prediction

Si-Belkacem Yamine Ketir +5
cs.AI 2026-05-15 reviewed

Hybrid AI beats LLMs on unseen tax law cases
Reasoners or Translators? Contamination-aware Evaluation and Neuro-Symbolic Robustness in Tax Law

Parisa Kordjamshidi +4
cs.CL 2026-05-15 reviewed

RecMem cuts LLM agent memory costs by up to 87%
RecMem: Recurrence-based Memory Consolidation for Efficient and Effective Long-Running LLM Agents

Zijie Dai +6
cs.CL 2026-05-15 reviewed

Typological priors replace flat labels for better multilingual S2ST
From Flat Language Labels to Typological Priors: Structured Language Conditioning for Multilingual Speech-to-Speech Translation

Yu Pan +6
cs.CL 2026-05-15 reviewed

Sparse mid-layer circuit handles LLM judgment across formats
Judge Circuits

Nils Feldhus +12
cs.CL 2026-05-15 reviewed

VLMs vary in adapting math lessons to student profiles
Can Vision Language Models Be Adaptive in Mathematics Education? A Learner Model-based Rubric Study

Jie Gao +5
cs.CL 2026-05-15 reviewed

Taxonomy separates AI cultural knowledge from framing and adaptation
Defining Cultural Capabilities for AI Evaluation: A Taxonomy Grounded in Intercultural Communication Theory

Isar Nejadgholi +3
cs.CL 2026-05-15 reviewed

Symbolic ontology extracts facts from police report narratives
Ontology for Policing: Conceptual Knowledge Learning for Semantic Understanding and Reasoning in Law Enforcement Reports

Anita Srbinovska +3
cs.CL 2026-05-15 reviewed

RL fine-tunes MT models to +5 chrF++ without any parallel data
Reference-Free Reinforcement Learning Fine-Tuning for MT: A Seq2Seq Perspective

Ernesto Garcia-Estrada +2
cs.HC 2026-05-15 reviewed

Graduated signals let AI companions flag risks without false alarms on positive states
SLIP & ETHICS: Graduated Intervention for AI Emotional Companions

Minseo Kim
cs.CL 2026-05-15 reviewed

Block attention nears full performance via semantic blocks
Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation

Shuaiyi Li +7