archive
Every paper Pith has read. Search by title, abstract, or pith.
7661 papers in cs.CL · page 6
-
Early entropy drop signals when CoT reasoning helps LLMs
When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions
-
Self-distillation balances consensus across views to cut noise from privileged signals
AVSD: Adaptive-View Self-Distillation by Balancing Consensus and Teacher-Specific Privileged Signals
-
Divide-prompt-refine produces more novel biomedical abstracts without training
Divide-Prompt-Refine: a Training-Free, Structure-Aware Framework for Biomedical Abstract Generation
-
Pipeline triples accuracy for Indigenous image captions
Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task
-
Offline consolidator cuts agent memory 12x while raising success
Auto-Dreamer: Learning Offline Memory Consolidation for Language Agents
-
1B model scores 60.7% on MMLU after 40B instruction tokens
HRM-Text: Efficient Pretraining Beyond Scaling
-
Self-training amplifies surface markers while deep syntax dies
Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies
-
25-30% of web medical AIs give inaccurate clinical advice
Do No Harm? Hallucination and Actor-Level Abuse in Web-Deployed Medical Large Language Models
-
Direct sign-to-sign model beats text cascade on accuracy and speed
Direct Translation between Sign Languages
-
Small models copy last CoT number for 89-92% of arithmetic accuracy
The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models
-
State management beats workspace isolation in multi-agent tasks
Multi-agent Collaboration with State Management
-
Gemination subclass drives errors in Japanese neural morphology
When Irregularity Helps: A Subclass Analysis of Inductive Bias in Neural Morphology
-
One rare verb subtype drives most neural morphology errors
When Irregularity Helps: A Subclass Analysis of Inductive Bias in Neural Morphology
-
Nine biomedical corpora differ in ways size and type stats miss
What Do Biomedical NER and Entity Linking Benchmarks Measure? A Corpus-Centric Diagnostic Framework
-
LLM agent accuracy drops to 0.54-0.62 without labels
AgentAtlas: Beyond Outcome Leaderboards for LLM Agents
-
Co-occurrence patterns support subject-verb agreement learning
Collocational bootstrapping: A hypothesis about the learning of subject-verb agreement in humans and neural networks
-
AI models lag behind text-only on 3D brain MRI benchmark
NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding
5 Piths -
Verbal feedback in RL makes LLM simulations more human-like
Reinforcing Human Behavior Simulation via Verbal Feedback
-
Audit split lifts source precision in LLM wiki tables from 36 to 51 percent
Stage-Audit: Auditable Source-Frontier Discovery for Cross-Wiki Tables
-
Trained reflectors improve language agents on new tasks
Training Language Agents to Learn from Experience
-
Reddit dataset tracks 12 MAHA health themes over six years
Hiding in Plain Sight: Finding MAHA on Reddit
-
CoT prompting leaves gender bias inside LLMs
Mechanics of Bias and Reasoning: Interpreting the Impact of Chain-of-Thought Prompting on Gender Bias in LLMs
-
Jigsaw puzzle explains ChatGPT through comic panels
Puzzled By ChatGPT? No more! A Jigsaw Puzzle to Promote AI Literacy and Awareness
-
LLMs switch from instructions to patterns when history conflicts
Do as I Say, Not as I Do: Instruction-Induction Conflict in LLMs
-
DEL raises LLM number prediction accuracy on math benchmarks
DEL: Digit Entropy Loss for Numerical Learning of Large Language Models
-
Non-reasoning fine-tuning beats reasoning for TTCW literary reviews
When Reasoning Supervision Hurts: TTCW-Based Long-Form Literary Review Generation
-
AI dialogue models sync states and predict turns ahead
Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models
-
TIDE boosts MoE diffusion LLM inference up to 1.5x
TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload
-
Staged perception training boosts VLM accuracy with shorter reasoning
From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models
-
ClinSeekAgent boosts clinical AI by actively seeking raw evidence
ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning
-
Compact tokens from knowledge graphs ground LLMs with 10x fewer tokens
KoRe: Compact Knowledge Representations for Large Language Models
-
Selective FP4 on prefilling yields 3x speedup for agentic LLMs
Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs
-
Counterfactual tests expose failures in LVLM attribution for chest X-rays
Rethinking Visual Attribution for Chest X-ray Reasoning in Large Vision Language Models
-
Checklist prompts score 7.5 out of 8 on LLM quality rubric
Less Back-and-Forth: A Comparative Study of Structured Prompting
-
LLMs miss implicit cues despite explicit instructions
MixRea: Benchmarking Explicit-Implicit Reasoning in Large Language Models
-
Dataset pairs LLM chats with users' reported thoughts
ThoughtTrace: Understanding User Thoughts in Real-World LLM Interactions
5 Piths -
Thoughts collected with LLM chats improve behavior forecasts
ThoughtTrace: Understanding User Thoughts in Real-World LLM Interactions
5 Piths -
Joint lattice testing calibrates cascaded RAG thresholds at target risk
BalanceRAG: Joint Risk Calibration for Cascaded Retrieval-Augmented Generation
-
Draft answer first then reflect to gain 23% accuracy with 57% fewer tokens
CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning
-
The paper applies Group-Relative Policy Optimization reinforcement learning to a 1.7B…
Text-to-SPARQL Generation with Reinforcement Learning: A GRPO-based Approach on DBLP
-
Belief consistency raises LLM agent success by 20 points
Rewarding Beliefs, Not Actions: Consistency-Guided Credit Assignment for Long-Horizon Agents
-
Prompt tuning labels radiology reports with 32 examples
PromptRad: Knowledge-Enhanced Multi-Label Prompt-Tuning for Low-Resource Radiology Report Labeling
-
Prompt tuning with UMLS synonyms labels reports from 32 examples
PromptRad: Knowledge-Enhanced Multi-Label Prompt-Tuning for Low-Resource Radiology Report Labeling
-
Language mutations extend conspiracy theory lifespans on X
Language Mutations Sustain the Persistences of Conspiracy Theories on Social Media
-
Gemination errors dominate Japanese verb model failures
Mind Your Moras: Orthography-Aware Error Analysis of Neural Japanese Morphological Generation
-
Gemination drives 75-80% of errors in Japanese past-tense models
Mind Your Moras: Orthography-Aware Error Analysis of Neural Japanese Morphological Generation
-
Speculative decoding now works across all batch sizes without quality loss
FlexDraft: Flexible Speculative Decoding via Attention Tuning and Bonus-Guided Calibration
-
Three memory layers improve long-term LLM agent recall
Rethinking How to Remember: Beyond Atomic Facts in Lifelong LLM Agent Memory
-
GPU-aware expert mapping cuts MoE latency by 7.9 percent on average
GEM: GPU-Variability-Aware Expert to GPU Mapping for MoE Systems
-
Position-dependent attention fixes constant risk on shifted reasoning
A Measure-Theoretic Analysis of Reasoning: Structural Generalization and Approximation Limits