archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 16

cs.AI 2026-05-13 reviewed

GraphRAG retrieval aligns LLM agents with social values
From Descriptive to Prescriptive: Uncover the Social Value Alignment of LLM-based Agents

Jinxian Qu +3
cs.LG 2026-05-13 reviewed

Spherical KV stores keys as radius and angle codes to cut cache traffic
SPHERICAL KV: Angle-Domain Attention and Rate-Distortion Retention for Efficient Long-Context Inference

Anay Chauhan +6
cs.CL 2026-05-13 reviewed

Attack collapses speculative decoding speedup by cutting token acceptance
Mistletoe: Stealthy Acceleration-Collapse Attacks on Speculative Decoding

Shuoyang Sun +8
cs.CL 2026-05-13 reviewed

Stealth attack collapses speculative decoding speedup
Mistletoe: Stealthy Acceleration-Collapse Attacks on Speculative Decoding

Shuoyang Sun +8
cs.LG 2026-05-13 reviewed

HodgeCover compresses MoE experts by covering harmonic cycles
HodgeCover: Higher-Order Topological Coverage Drives Compression of Sparse Mixture-of-Experts

Tao Zhong +2

2 Piths
cs.CL 2026-05-13 reviewed

42M Spanish cyber model reaches 0.78 conversation score
VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use

Juan S. Santillana
cs.CL 2026-05-13 reviewed

Rebalanced training gives 42M Spanish cyber model tool-use ability
VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use

Juan S. Santillana
cs.CL 2026-05-13 reviewed

Rebalanced tool-use data lifts 42M Spanish cyber model to 0.23 accuracy
VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use

Juan S. Santillana
cs.CL 2026-05-13 reviewed

Six hours of data let a two-stage model beat larger ones on Wardaman
WARDEN: Endangered Indigenous Language Transcription and Translation with 6 Hours of Training Data

Ziheng Zhang +3
cs.SD 2026-05-13 reviewed

No voice agent tops 0.5 on both accuracy and experience
EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents

Tara Bogavelli +12

2 Piths
cs.CL 2026-05-13 reviewed

Agent weight updates cut token use 83% while raising accuracy
Good Agentic Friends Do Not Just Give Verbal Advice: They Can Update Your Weights

Wenrui Bao +5
cs.CL 2026-05-13 reviewed

Finetuning makes models believe claims labeled false
Negation Neglect: When models fail to learn negations in training

Harry Mayne +5

2 Piths
cs.CL 2026-05-13 reviewed

LLM pipeline turns text into argument graphs
An LLM-Based System for Argument Mining

Paulo Pirozelli +3
cs.CL 2026-05-13 reviewed

LLM pipeline builds argument graphs from plain text
An LLM-Based System for Argument Mining

Paulo Pirozelli +3
cs.CL 2026-05-13 reviewed

Hidden-state transport geometry locates first LLM reasoning error
Where Does Reasoning Break? Step-Level Hallucination Detection via Hidden-State Transport Geometry

Tyler Alvarez +1
cs.CL 2026-05-13 reviewed

MoE beats dense on active params but loses on total capacity
Dense vs Sparse Pretraining at Tiny Scale: Active-Parameter vs Total-Parameter Matching

Abdalrahman Wael
cs.LG 2026-05-13 reviewed

Trajectory balance stops diffusion models locking onto few paths
Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models

Saba Ahmadi +2
cs.AI 2026-05-13 reviewed

Models detect sensory-text mismatches inside but ignore them in answers
Senses Wide Shut: A Representation-Action Gap in Omnimodal LLMs

Trung Nguyen Quang +5
cs.CL 2026-05-13 reviewed

Fine-tuned 8B LLMs beat larger models on children's story difficulty
Children's English Reading Story Generation via Supervised Fine-Tuning of Compact LLMs with Controllable Difficulty and Safety

Qian Shen (1) +7
cs.CL 2026-05-13 reviewed

RTLC prompting boosts LLM judge accuracy by 14 points
RTLC -- Research, Teach-to-Learn, Critique: A three-stage prompting paradigm inspired by the Feynman Learning Technique that lifts LLM-as-judge accuracy on JudgeBench with no fine-tuning

Andrea Morandi
cs.CV 2026-05-13 reviewed

Stage-wise DPO reduces hallucinations in vision-language models
Reducing Hallucination in Vision-Language Models via Stage-wise Preference Optimization under Distribution Shift

Qinwu Xu
cs.CL 2026-05-13 reviewed

Fine-tuning plus hierarchical prompts strengthen propaganda detection
Fine-tuning with Hierarchical Prompting for Robust Propaganda Classification Across Annotation Schemas

Lukas St\"ahelin +8
cs.LG 2026-05-13 reviewed

Low-rank training reaches distinct loss basins from full rank
Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training

Namrata Shivagunde +3
cs.LG 2026-05-13 reviewed

Low-rank pre-training lands in different loss basins than full-rank
Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training

Namrata Shivagunde +3
cs.CL 2026-05-13 reviewed

Compiler produces reusable configs for LLM workflows at 6.4x speedup
FlowCompile: An Optimizing Compiler for Structured LLM Workflows

Junyan Li +4
cs.CL 2026-05-13 reviewed

Truncating supervision at feedback collapse beats full OPD
Prefix Teach, Suffix Fade: Local Teachability Collapse in Strong-to-Weak On-Policy Distillation

Kaiyuan Liu +5
cs.LG 2026-05-13 reviewed

RDPO normalizes and whitens rewards to stabilize RL advantages
Multi-Objective and Mixed-Reward Reinforcement Learning via Reward-Decorrelated Policy Optimization

Yang Bai +7
cs.CL 2026-05-13 reviewed

Edit-level vote reduces over-correction in LLM grammar fixes
Edit-level Majority Voting Mitigates Over-Correction in LLM-based Grammatical Error Correction

Takumi Goto +2
cs.CL 2026-05-13 reviewed

LLM judges favor machine translations over creative literary ones
Creativity Bias: How Machine Evaluation Struggles with Creativity in Literary Translations

Kyo Gerrits +2
cs.CL 2026-05-13 reviewed

Artificial uncertainty on easy data improves real uncertainty probes
Inducing Artificial Uncertainty in Language Models

Sophia Hager +2
cs.CV 2026-05-13 reviewed

OCR training method improves text reading in blurry and cluttered images
Multilingual OCR-Aware Fine-Tuning and Prompt-Guided Chain-of-Thought Reasoning for Multimodal Large Language Models

Qinwu Xu +3
cs.AI 2026-05-13 reviewed

LLMs show recall-safety tradeoffs on real ICU data
RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation

Chengzhi Shen +9
cs.CL 2026-05-13 reviewed

Locale prompts eliminate SLM copying in on-device PII replacement
Locale-Conditioned Few-Shot Prompting Mitigates Demonstration Regurgitation in On-Device PII Substitution with Small Language Models

Anuj Sadani +1
cs.LG 2026-05-13 reviewed

Temperature adjustment turns reward models into a calibrated SLOP
Temper and Tilt Lead to SLOP: Reward Hacking Mitigation with Inference-Time Alignment

Ye Wang +2
cs.AI 2026-05-13 reviewed

Students rate AI slides equal to instructor ones
AI-Generated Slides: Are They Good? Can Students Tell?

Juho Leinonen +2
cs.CL 2026-05-13 reviewed

Ordered demos turn many-shot CoT into test-time learning
Many-Shot CoT-ICL: Making In-Context Learning Truly Learn

Tsz Ting Chung +3
cs.CL 2026-05-13 reviewed

Shared covariance summation leads multilingual editing results
Merging Methods for Multilingual Knowledge Editing for Large Language Models: An Empirical Odyssey

Kunil Lee +3
cs.CL 2026-05-13 reviewed

Reflective experiences guide LLM agents to better memory searches
R^2-Mem: Reflective Experience for Memory Search

Xinyuan Wang +4
cs.LG 2026-05-13 reviewed

Fragmentation strictly raises finite-context log-loss
Effective Context in Transformers: An Analysis of Fragmentation and Tokenization

Amirmehdi Jafari Fesharaki +2
cs.CL 2026-05-13 reviewed

Planning mechanism lifts LLM graph retrieval by 18 percent
PersonalAI 2.0: Enhancing knowledge graph traversal/retrieval with planning mechanism for Personalized LLM Agents

Mikhail Menschikov +10
cs.LG 2026-05-13 reviewed

OSDN preconditioner cuts recall residual 39% at 1.3B scale
OSDN: Improving Delta Rule with Provable Online Preconditioning in Linear Attention

Chenyu Zhou +5
cs.CL 2026-05-13 reviewed

Decomposed rewards boost vision-language reasoning
PDCR: Perception-Decomposed Confidence Reward for Vision-Language Reasoning

Hee Suk Yoon +8
cs.CL 2026-05-13 reviewed

Memory of prior links improves biomedical entity consistency
LongBEL: Long-Context and Document-Consistent Biomedical Entity Linking

Adam Remaki +2
cs.AI 2026-05-13 reviewed

DRAT predicts LLMs' scientific ideation better than prior tests
Assessing the Creativity of Large Language Models: Testing, Limits, and New Frontiers

Samuel Schapiro +3
cs.AI 2026-05-13 reviewed

Cognitive folding turns event streams into proactive agent memory
CogniFold: Always-On Proactive Memory via Cognitive Folding

Suli Wang +5
cs.CL 2026-05-13 reviewed

BPE dropout during pretraining improves low-resource NLP results
Pretraining Language Models with Subword Regularization: An Empirical Study of BPE Dropout in Low-Resource NLP

Ruan Visser +2
cs.CL 2026-05-13 reviewed

Token alignments from monolingual data speed LLM vocabulary adaptation
TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment

Chong Li +4
cs.LG 2026-05-13 reviewed

Two-stage tuning fixes LLM table errors with 1,000 examples
LIFT: Last-Mile Fine-Tuning for Table Explicitation

Divij Khaitan +1
cs.LG 2026-05-13 reviewed

Multi-stage ranking improves checkpoint selection for multimodal LLMs
Robust Checkpoint Selection for Multimodal LLMs via Agentic Evaluation and Stability-Aware Ranking

Qinwu Xu +2
cs.CL 2026-05-13 reviewed

Language-specific thresholds lift slur detection F1 by 2-5%
KIT-TIP-NLP at MultiPride: Continual Learning with Multilingual Foundation Model

Barathi Ganesh HB +3