archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 11

cs.CL 2026-05-17 reviewed

Stigmatizing language skews LLMs toward less aggressive medical advice
Artificial Intolerance: Stigmatizing Language in Clinical Documentation Skews Large Language Model Decision-Making

Jen-tse Huang +7
cs.AI 2026-05-17 reviewed

ChemVA lifts LLMs on chemical diagrams by 20 points
ChemVA: Advancing Large Language Models on Chemical Reaction Diagrams Understanding

Mingyang Rao +6
cs.CL 2026-05-17 reviewed

LLMs annotate Mandarin narratives nearly as well as humans
LLMs for automatic annotation of Mandarin narrative transcripts

Qingwen Zhao +5
cs.CL 2026-05-16 reviewed

AI models barely beat baseline on pluralistic community moderation
PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media

Zoher Kachwala +5
cs.CL 2026-05-16 reviewed

Many models show weaker safety in English than low-resource languages
Why Do Safety Guardrails Degrade Across Languages?

Max Zhang +3
cs.LG 2026-05-16 reviewed

On-device specs match cloud accuracy on 4 of 8 benchmarks
OpenJarvis: Personal AI, On Personal Devices

Jon Saad-Falcon +12
cs.AI 2026-05-16 reviewed

Explicit provenance required to compute AI responsibility
Responsible Agentic AI Requires Explicit Provenance

Jinwei Hu +5
cs.CL 2026-05-16 reviewed

Low-cost adapters enable multimodal LLMs for low-resource languages
Multilingual and Multimodal LLMs in the Wild: Building for Low-Resource Languages

Firoj Alam +2
cs.CV 2026-05-16 reviewed

Models collapse on multi-sequence brain MRI questions
UCSF-PDGM-VQA: Visual Question Answering dataset for brain tumor MRI interpretation

Shiv Ghosh +5
cs.CV 2026-05-16 reviewed

VLMs collapse on multi-sequence brain tumor MRI scans
UCSF-PDGM-VQA: Visual Question Answering dataset for brain tumor MRI interpretation

Shiv Ghosh +5
cs.CL 2026-05-16 reviewed

Small attention-head sets suppress deceptive commitment across environments
The Point of No Return: Counterfactual Localization of Deceptive Commitment in Language-Model Reasoning

Scott Merrill +1
cs.CL 2026-05-16 reviewed

Router matches top LLM quality at half the cost
HyDRA: Hybrid Dynamic Routing Architecture for Heterogeneous LLM Pools

Aashna Garg +4
cs.CL 2026-05-16 reviewed

Three agents boost medical QA accuracy by 6.46 points
SEMA-RAG: A Self-Evolving Multi-Agent Retrieval-Augmented Generation Framework for Medical Reasoning

Yongfeng Huang +2
cs.CV 2026-05-16 reviewed

Density weighting recovers 8.7 OCR points in hybrid VLM distillation
HEED: Density-Weighted Residual Alignment for Hybrid Vision-Language Model Distillation

Yihao Liang +1
cs.CL 2026-05-16 reviewed

Auto-generated reasoning chains lift ICL accuracy on multi-step tasks
ACIL: Auto Chain of Thoughts for In-Context Learning

Rui Chu
cs.LG 2026-05-16 reviewed

Scale decides if language model geometry stays organized for prediction
Scale Determines Whether Language Models Organize Representation Geometry for Prediction

Weilun Xu
cs.CL 2026-05-16 reviewed

Top LLMs cover only 47.8% of real consumer reactions
Can LLMs Think Like Consumers? Benchmarking Crowd-Level Reaction Reconstruction with ConsumerSimBench

Tianyu Wang +2
cs.AI 2026-05-16 reviewed

LLM agent builds traceable knowledge graphs autonomously
RAGA: Reading-And-Graph-building-Agent for Autonomous Knowledge Graph Construction and Retrieval-Augmented Generation

Chengrui Han +1
cs.LG 2026-05-16 reviewed

AI Agents Differ Sharply in Solo ML Model Training on One GPU
1GC-7RC: One Graphic Card -- Seven Research Challenges! How Good Are AI Agents at Doing Your Job?

Robin-Nico Kampa +4
cs.CL 2026-05-16 reviewed

Agentic cycle makes translation serve communication goals first
Agentic AI Translate: An Agentic Translator Prototype for Translation as Communication Design

Masaru Yamada
cs.LG 2026-05-16 reviewed

Self-evolution trains math-reasoning LLMs with under 2K samples
D$^2$Evo: Dual Difficulty-Aware Self-Evolution for Data-Efficient Reinforcement Learning

Ru Zhang +6
cs.CL 2026-05-16 reviewed

Prompt leaks let simple text match fake hallucination detection
PARALLAX: Separating Genuine Hallucination Detection from Benchmark Construction Artifacts

Khizar Hussain +1
cs.SI 2026-05-16 reviewed

Algorithmic feeds reshape how users write
Algorithmic Cultivation: How Social Media Feeds Shape User Language

Olivia Pal +3
cs.PL 2026-05-16 reviewed

Every string over its alphabet is a valid program
The IsalProgram Programming Language

Ezequiel L\'opez-Rubio
cs.CL 2026-05-16 reviewed

The paper presents HalluScore
HalluScore: Large Language Model Hallucination Question Answering Benchmark

Aisha Alansari +1
cs.CL 2026-05-16 reviewed

Fine-tuning stabilizes LLM personality scores but accuracy stays near chance
Evaluation Drift in LLM Personality Induction: Are We Moving the Goalpost?

Prateek Rajput +4
cs.CL 2026-05-16 reviewed

Transformers recover item difficulty signal from wording alone
Response-free item difficulty modelling for multiple-choice items with fine-tuned transformers: Component-wise representation and multi-task learning

Jan Net\'ik +1
cs.CL 2026-05-16 reviewed

Test-time skill synthesis raises LLM agent success rates
Skills on the Fly: Test-Time Adaptive Skill Synthesis for LLM Agents

Jingxing Wang +6
cs.CL 2026-05-16 reviewed

Two-stage adapters put LLM first in coreference task
Closing the Gap at CRAC 2026: Two-Stage Adaptation for LLM-Based Multilingual Coreference Resolution

Antoine Bourgois +2
cs.CL 2026-05-16 reviewed

Two-stage adapters lead LLM multilingual coreference resolution
Closing the Gap at CRAC 2026: Two-Stage Adaptation for LLM-Based Multilingual Coreference Resolution

Antoine Bourgois +2
cs.AI 2026-05-16 reviewed

EEG shows why people miss some AI hallucinations
How do Humans Process AI-generated Hallucination Contents: a Neuroimaging Study

Shuqi Zhu +6
cs.CL 2026-05-16 reviewed

Diffusion LLMs learn faster decoding by rolling back mistakes
Roll Out and Roll Back: Diffusion LLMs are Their Own Efficiency Teachers

Fanqin Zeng +8
cs.CL 2026-05-16 reviewed

Reasoning effort fails to change LRM alignment with humans
Effort as Ceiling, Not Dial: Reasoning Budget Does Not Modulate Cognitive Cost Alignment Between Humans and Large Reasoning Models

Yueqing Hu +1
cs.CL 2026-05-16 reviewed

Full-attention LLMs sparsify in hundreds of steps
Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

Yanke Zhou +8
cs.CL 2026-05-16 reviewed

Pinyin and glyph features fix homophone errors in Chinese keyword filtering
JSPG: Dynamic Dictionary Filtering via Joint Semantic-Pinyin-Glyph Retrieval for Chinese Contextual ASR

Shilin Zhou +1
cs.CE 2026-05-16 reviewed

LLM trading alpha is not deployment evidence
The Alpha Illusion: Reported Alpha from LLM Trading Agents Should Not Be Treated as Deployment Evidence

Yuxuan Ye +9
cs.CV 2026-05-16 reviewed

DriveSafe uses scene captions to improve driving risk detection
DriveSafe: A Framework for Risk Detection and Safety Suggestions in Driving Scenarios

Sainithin Artham +3
cs.CL 2026-05-16 reviewed

Expert targets raise merged-model 4-bit accuracy from 35% to 77%
E-PMQ: Expert-Guided Post-Merge Quantization with Merged-Weight Anchoring

Wenjun Wang +6
cs.CL 2026-05-16 reviewed

Multiple translations become one benchmark for Pali
PaliBench: A Multi-Reference Blueprint for Classical Language Translation Benchmarks

M\'at\'e Metzger +1
cs.CL 2026-05-16 reviewed

Mixing a model's own predictions lets it add facts without forgetting old skills
MixSD: Mixed Contextual Self-Distillation for Knowledge Injection

Jiarui Liu +7
cs.CL 2026-05-16 reviewed

MixSD retains 100% of base skills while injecting new facts
MixSD: Mixed Contextual Self-Distillation for Knowledge Injection

Jiarui Liu +7
cs.CV 2026-05-16 reviewed

Induced patterns let VLMs plan beyond single-step vision
Thinking with Patterns: Breaking the Perceptual Bottleneck in Visual Planning via Pattern Induction

Yichang Jian +4
cs.CL 2026-05-16 reviewed

First structured dataset released for Indian RTI decisions
RTI-Bench: A Structured Dataset for Indian Right-to-Information Decision Analysis

Joy Bose
cs.CL 2026-05-16 reviewed

Block-union tables cut chunked prefill attention time by 2.72x
CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection

Jiwon Song +3
cs.CL 2026-05-16 reviewed

Diffusion code generation meets constraints through local edits
Constrained Code Generation with Discrete Diffusion

Lize Shao +4
cs.LG 2026-05-16 reviewed

Decoupling KL and prefixes creates four LLM distillation objectives
Decoupling KL and Trajectories: A Unified Perspective for SFT, DAgger, Offline RL, and OPD in LLM Distillation

Anhao Zhao +5
cs.LG 2026-05-16 reviewed

LLM confidence trajectories separate correct reasoning without content
Confidence Geometry Reveals Trace-Level Correctness in Large Language Model Reasoning

Shuo Liu +2
cs.CL 2026-05-16 reviewed

AI agents reach 6.89x GPU kernel speedups but drop on unseen shapes
AgentKernelArena: Generalization-Aware Benchmarking of GPU Kernel Optimization Agents

Sharareh Younesian +13
cs.LG 2026-05-16 reviewed

Eight calibration passes set LoRA ranks by layer
FIM-LoRA: Task-Informative Rank Allocation for LoRA via Calibration-Time Gradient-Variance Estimation

Ramakrishnan Sathyavageeswaran
cs.LG 2026-05-16 reviewed

Execution rewards keep tool accuracy above 90% at depth 6
TIER: Trajectory-Invariant Execution Rewards for Multi-Step Tool Composition

Anay Kulkarni +5