archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 6

cs.LG 2026-05-20 reviewed

Early entropy drop signals when CoT reasoning helps LLMs
When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions

Wei Xia +3
cs.LG 2026-05-20 reviewed

Self-distillation balances consensus across views to cut noise from privileged signals
AVSD: Adaptive-View Self-Distillation by Balancing Consensus and Teacher-Specific Privileged Signals

Duy Nguyen +9
cs.CL 2026-05-20 reviewed

Divide-prompt-refine produces more novel biomedical abstracts without training
Divide-Prompt-Refine: a Training-Free, Structure-Aware Framework for Biomedical Abstract Generation

Sylvey Lin +5
cs.CL 2026-05-20 reviewed

Pipeline triples accuracy for Indigenous image captions
Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task

Aashish Dhawan +4
cs.CL 2026-05-20 reviewed

Offline consolidator cuts agent memory 12x while raising success
Auto-Dreamer: Learning Offline Memory Consolidation for Language Agents

Chongrui Ye +7
cs.CL 2026-05-20 reviewed

1B model scores 60.7% on MMLU after 40B instruction tokens
HRM-Text: Efficient Pretraining Beyond Scaling

Guan Wang +8
cs.CL 2026-05-20 reviewed

Self-training amplifies surface markers while deep syntax dies
Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies

Ming Liu
cs.CL 2026-05-20 reviewed

25-30% of web medical AIs give inaccurate clinical advice
Do No Harm? Hallucination and Actor-Level Abuse in Web-Deployed Medical Large Language Models

Sunday Oyinlola Ogundoyin +2
cs.CL 2026-05-20 reviewed

Direct sign-to-sign model beats text cascade on accuracy and speed
Direct Translation between Sign Languages

Zetian Wu +5
cs.LG 2026-05-20 reviewed

Small models copy last CoT number for 89-92% of arithmetic accuracy
The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models

Ming Liu
cs.MA 2026-05-19 reviewed

State management beats workspace isolation in multi-agent tasks
Multi-agent Collaboration with State Management

Mengyang Liu +4
cs.CL 2026-05-19 reviewed

Gemination subclass drives errors in Japanese neural morphology
When Irregularity Helps: A Subclass Analysis of Inductive Bias in Neural Morphology

Wen Zhang
cs.CL 2026-05-19 reviewed

One rare verb subtype drives most neural morphology errors
When Irregularity Helps: A Subclass Analysis of Inductive Bias in Neural Morphology

Wen Zhang
cs.CL 2026-05-19 reviewed

Nine biomedical corpora differ in ways size and type stats miss
What Do Biomedical NER and Entity Linking Benchmarks Measure? A Corpus-Centric Diagnostic Framework

Robert Leaman +2
cs.AI 2026-05-19 reviewed

LLM agent accuracy drops to 0.54-0.62 without labels
AgentAtlas: Beyond Outcome Leaderboards for LLM Agents

Parsa Mazaheri +1
cs.CL 2026-05-19 reviewed

Co-occurrence patterns support subject-verb agreement learning
Collocational bootstrapping: A hypothesis about the learning of subject-verb agreement in humans and neural networks

Claire Hobbs +1
cs.CV 2026-05-19 reviewed

AI models lag behind text-only on 3D brain MRI benchmark
NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding

Mohammad H. Abbasi +14

5 Piths
cs.LG 2026-05-19 reviewed

Verbal feedback in RL makes LLM simulations more human-like
Reinforcing Human Behavior Simulation via Verbal Feedback

Weiwei Sun +15
cs.CL 2026-05-19 reviewed

Audit split lifts source precision in LLM wiki tables from 36 to 51 percent
Stage-Audit: Auditable Source-Frontier Discovery for Cross-Wiki Tables

Chen Shen
cs.LG 2026-05-19 reviewed

Trained reflectors improve language agents on new tasks
Training Language Agents to Learn from Experience

Yuval Shalev +2
cs.SI 2026-05-19 reviewed

Reddit dataset tracks 12 MAHA health themes over six years
Hiding in Plain Sight: Finding MAHA on Reddit

Sabit Ahmed +2
cs.CL 2026-05-19 reviewed

CoT prompting leaves gender bias inside LLMs
Mechanics of Bias and Reasoning: Interpreting the Impact of Chain-of-Thought Prompting on Gender Bias in LLMs

Edie Pearman +5
cs.CL 2026-05-19 reviewed

Jigsaw puzzle explains ChatGPT through comic panels
Puzzled By ChatGPT? No more! A Jigsaw Puzzle to Promote AI Literacy and Awareness

Francesca Padovani +1
cs.CL 2026-05-19 reviewed

LLMs switch from instructions to patterns when history conflicts
Do as I Say, Not as I Do: Instruction-Induction Conflict in LLMs

Carolina Camassa +1
cs.CL 2026-05-19 reviewed

DEL raises LLM number prediction accuracy on math benchmarks
DEL: Digit Entropy Loss for Numerical Learning of Large Language Models

Zhaohui Zheng +5
cs.CL 2026-05-19 reviewed

Non-reasoning fine-tuning beats reasoning for TTCW literary reviews
When Reasoning Supervision Hurts: TTCW-Based Long-Form Literary Review Generation

Jinlong Liu +2
cs.CL 2026-05-19 reviewed

AI dialogue models sync states and predict turns ahead
Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models

Pablo Riera +4
cs.CL 2026-05-19 reviewed

TIDE boosts MoE diffusion LLM inference up to 1.5x
TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload

Zhiben Chen +4
cs.CL 2026-05-19 reviewed

Staged perception training boosts VLM accuracy with shorter reasoning
From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models

Juncheng Wu +8
cs.CL 2026-05-19 reviewed

ClinSeekAgent boosts clinical AI by actively seeking raw evidence
ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning

Juncheng Wu +7
cs.CL 2026-05-19 reviewed

Compact tokens from knowledge graphs ground LLMs with 10x fewer tokens
KoRe: Compact Knowledge Representations for Large Language Models

Davide Cavicchini +2
cs.CL 2026-05-19 reviewed

Selective FP4 on prefilling yields 3x speedup for agentic LLMs
Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs

Haiquan Lu +4
cs.CV 2026-05-19 reviewed

Counterfactual tests expose failures in LVLM attribution for chest X-rays
Rethinking Visual Attribution for Chest X-ray Reasoning in Large Vision Language Models

Guangzhi Xiong +4
cs.CL 2026-05-19 reviewed

Checklist prompts score 7.5 out of 8 on LLM quality rubric
Less Back-and-Forth: A Comparative Study of Structured Prompting

Saurav Ghosh +2
cs.CL 2026-05-19 reviewed

LLMs miss implicit cues despite explicit instructions
MixRea: Benchmarking Explicit-Implicit Reasoning in Large Language Models

Yuanqing Cai +5
cs.CL 2026-05-19 reviewed

Dataset pairs LLM chats with users' reported thoughts
ThoughtTrace: Understanding User Thoughts in Real-World LLM Interactions

Chuanyang Jin +8

5 Piths
cs.CL 2026-05-19 reviewed

Thoughts collected with LLM chats improve behavior forecasts
ThoughtTrace: Understanding User Thoughts in Real-World LLM Interactions

Chuanyang Jin +8

5 Piths
cs.CL 2026-05-19 reviewed

Joint lattice testing calibrates cascaded RAG thresholds at target risk
BalanceRAG: Joint Risk Calibration for Cascaded Retrieval-Augmented Generation

Zijun Jia +8
cs.CL 2026-05-19 reviewed

Draft answer first then reflect to gain 23% accuracy with 57% fewer tokens
CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning

Dachuan Shi +6
cs.CL 2026-05-19 reviewed

The paper applies Group-Relative Policy Optimization reinforcement learning to a 1.7B…
Text-to-SPARQL Generation with Reinforcement Learning: A GRPO-based Approach on DBLP

Jann Pfeifer +2
cs.CL 2026-05-19 reviewed

Belief consistency raises LLM agent success by 20 points
Rewarding Beliefs, Not Actions: Consistency-Guided Credit Assignment for Long-Horizon Agents

Wenjie Tang +4
cs.CL 2026-05-19 reviewed

Prompt tuning labels radiology reports with 32 examples
PromptRad: Knowledge-Enhanced Multi-Label Prompt-Tuning for Low-Resource Radiology Report Labeling

Ying-Jia Lin +5
cs.CL 2026-05-19 reviewed

Prompt tuning with UMLS synonyms labels reports from 32 examples
PromptRad: Knowledge-Enhanced Multi-Label Prompt-Tuning for Low-Resource Radiology Report Labeling

Ying-Jia Lin +5
cs.CL 2026-05-19 reviewed

Language mutations extend conspiracy theory lifespans on X
Language Mutations Sustain the Persistences of Conspiracy Theories on Social Media

Calvin Yixiang Cheng +2
cs.CL 2026-05-19 reviewed

Gemination errors dominate Japanese verb model failures
Mind Your Moras: Orthography-Aware Error Analysis of Neural Japanese Morphological Generation

Wen Zhang
cs.CL 2026-05-19 reviewed

Gemination drives 75-80% of errors in Japanese past-tense models
Mind Your Moras: Orthography-Aware Error Analysis of Neural Japanese Morphological Generation

Wen Zhang
cs.CL 2026-05-19 reviewed

Speculative decoding now works across all batch sizes without quality loss
FlexDraft: Flexible Speculative Decoding via Attention Tuning and Bonus-Guided Calibration

Yaojie Zhang +7
cs.CL 2026-05-19 reviewed

Three memory layers improve long-term LLM agent recall
Rethinking How to Remember: Beyond Atomic Facts in Lifelong LLM Agent Memory

Jingwei Sun +4
cs.DC 2026-05-19 reviewed

GPU-aware expert mapping cuts MoE latency by 7.9 percent on average
GEM: GPU-Variability-Aware Expert to GPU Mapping for MoE Systems

Sourish Wawdhane +2
cs.LG 2026-05-19 reviewed

Position-dependent attention fixes constant risk on shifted reasoning
A Measure-Theoretic Analysis of Reasoning: Structural Generalization and Approximation Limits

Yuyang Zhang +3