archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 13

cs.CL 2026-05-15 reviewed

Block attention nears full performance via semantic blocks
Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation

Shuaiyi Li +7
cs.CL 2026-05-15 reviewed

Block attention matches full results via segmentation and distillation
Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation

Shuaiyi Li +7
cs.CL 2026-05-15 reviewed

Dataset links Russian speeches to images and translations
Linked Multi-Model Data on Russian Domestic and Foreign Policy Speeches

Daria Blinova +6
cs.CV 2026-05-15 reviewed

VLMs miss image swaps when claiming to recheck visuals
Are VLMs Seeing or Just Saying? Uncovering the Illusion of Visual Re-examination

Chufan Shi +6

1 Piths
cs.CV 2026-05-15 reviewed

Brain voxels respond to specific image features identified by interpretability tools
Mechanistically Interpretable Neural Encoding Reveals Fine-Grained Functional Selectivity in Human Visual Cortex

Idan Daniel Grosbard +2
cs.HC 2026-05-15 reviewed

Canvas turns linear LLM chats into branching trees
Conversations in Space: Structuring Non-Linear LLM Interactions on a Canvas

Rifat Mehreen Amin +4
cs.SE 2026-05-15 reviewed

BootstrapAgent distills repo setup into reusable contracts
BootstrapAgent: Distilling Repository Setup into Reusable Agent Knowledge

Sihan Fu +4
cs.CL 2026-05-15 reviewed

Dataset shows MT systems lose PDF layout during translation
ForMaT: Dataset for Visually-Grounded Multilingual PDF Translation

Micha{\l} Ciesi\'o{\l}ka +3
cs.CL 2026-05-15 reviewed

Small open LLMs match big models in translation quality estimation
CompactQE: Interpretable Translation Quality Estimation via Small Open-Weight LLMs

Kamil Guttmann +3
cs.CL 2026-05-15 reviewed

DimMem hits 81% accuracy with 24% lower token cost
DimMem: Dimensional Structuring for Efficient Long-Term Agent Memory

Wentao Qiu +4
cs.AI 2026-05-15 reviewed

Strategy nudging lifts RLVR performance beyond larger rollouts
Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR

Chanuk Lee +3
cs.CL 2026-05-15 reviewed

Collaborative filtering assigns optimal contexts per LLM input
Contexting as Recommendation: Evolutionary Collaborative Filtering for Context Engineering

Jiachen Zhu +11
cs.CL 2026-05-15 reviewed

Benchmark shows agents fail at composing scattered multimodal evidence
SMMBench: A Benchmark for Source-Distributed Multimodal Agent Memory

Huacan Chai +9
cs.CL 2026-05-15 reviewed

Hybrid tree-graph evolves agent memory into summaries
H-Mem: A Novel Memory Mechanism for Evolving and Retrieving Agent Memory via a Hybrid Structure

Jiawei Yu +3
cs.LG 2026-05-15 reviewed

Reshaping anchors lets LLMs sample more reasoning modes
SAGE: Shaping Anchors for Guided Exploration in RLVR of LLMs

Chanuk Lee +2
cs.CL 2026-05-15 reviewed

Activation steering plus rewards improves unlearning and quality in MLLMs
ASRU: Activation Steering Meets Reinforcement Unlearning for Multimodal Large Language Models

Jiahui Guang +6
cs.CL 2026-05-15 reviewed

Few-shot LLMs beat BioBERT on patient inquiry triage
Few-Shot Large Language Models for Actionable Triage Categorization of Online Patient Inquiries

Liqi Zhou +1
cs.CL 2026-05-15 reviewed

Benchmark shows VLMs lag on code-based diagram tasks
VCG-Bench: Towards A Unified Visual-Centric Benchmark for Structured Generation and Editing

Xiaoyan Su +10
cs.CL 2026-05-15 reviewed

Dynamic chunking lifts diffusion LMs over positional blocks
Dynamic Chunking for Diffusion Language Models

Yichen Zhu +5
cs.CL 2026-05-15 reviewed

LLMs miss ambiguity in Chinese sentences
Evaluating Chinese Ambiguity Understanding in Large Language Models

Junwen Mo +4
cs.CL 2026-05-15 reviewed

LLMs heavily favor English with no cost savings from continual pre-training
Toward LLMs Beyond English-Centric Development

Sho Takase +1
cs.CL 2026-05-15 reviewed

Diffusion LLMs reach 5.5x tokens per forward pass
PSD: Pushing the Pareto Frontier of Diffusion LLMs via Parallel Speculative Decoding

Shengyin Sun +9
cs.CL 2026-05-15 reviewed

LLMs master new code syntax but cannot apply it to solve problems
Syntax Without Semantics: Teaching Large Language Models to Code in an Unseen Language

Vinayshekhar Bannihatti Kumar +3
cs.LG 2026-05-15 reviewed

Steering vectors accelerate optimization for rare behaviors
VSPO: Vector-Steered Policy Optimization for Behavioral Control

Xuechen Zhang +5
cs.CL 2026-05-15 reviewed

LLMs spot mental health entities but miss relations and reasoning
MHGraphBench: Knowledge Graph-Grounded Benchmarking of Mental Health Knowledge in Large Language Models

Weixin Liu +6
cs.CL 2026-05-15 reviewed

Semantic rewards improve LLM uncertainty calibration
Calibrating LLMs with Semantic-level Reward

Fengfei Yu +4
cs.CL 2026-05-15 reviewed

Semantic reward cuts LLM calibration error by up to 40%
Calibrating LLMs with Semantic-level Reward

Fengfei Yu +4
cs.CL 2026-05-15 reviewed

Learned policy decides when to add one sequential step after parallel agents
Response-Conditioned Parallel-to-Sequential Orchestration for Multi-Agent Systems

Nurbek Tastan +6
cs.CL 2026-05-15 reviewed

LLM activation peaks vary by 10,000x across model families
Measuring Maximum Activations in Open Large Language Models

Luxuan Chen +11
cs.CL 2026-05-15 reviewed

Dependency graphs lift Transformer syntactic generalization
GiLT: Augmenting Transformer Language Models with Dependency Graphs

Tianyu Huang +3
cs.CL 2026-05-15 reviewed

Latent geometry fails to ensure good token recovery
When Latent Geometry Is Not Enough: Draft-Conditioned Latent Refinement for Non-Autoregressive Text Generation

De Shuai Zhang
cs.LG 2026-05-15 reviewed

High-divergence prompts improve distillation by up to 15%
DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation

Jaehun Jung +6
cs.LG 2026-05-15 reviewed

Divergence-guided prompts deliver 15% gains in VLM distillation
DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation

Jaehun Jung +6
cs.CL 2026-05-15 reviewed

Reliability signal cuts token use by a third in reasoning
Process Rewards with Learned Reliability

Jinyuan Li +7
cs.CL 2026-05-15 reviewed

Benchmark tests LLM detectors across 8 languages and real edits
DetectRL-X: Towards Reliable Multilingual and Real-World LLM-Generated Text Detection

Junchao Wu +10
cs.CL 2026-05-15 reviewed

DetectRL-X benchmark tests detectors across 8 languages and real AI writing
DetectRL-X: Towards Reliable Multilingual and Real-World LLM-Generated Text Detection

Junchao Wu +10
cs.CL 2026-05-15 reviewed

RoPE loses position and token distinction in long contexts
RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably

Yufeng Du +7
cs.LG 2026-05-15 reviewed

Draft model prunes 90% of attention in large LLMs
STS: Efficient Sparse Attention with Speculative Token Sparsity

Ceyu Xu +3
cs.CL 2026-05-14 reviewed

New benchmark suite tests LLMs on finance difficulty levels
FINESSE-Bench: A Hierarchical Benchmark Suite for Financial Domain Knowledge and Technical Analysis in Large Language Models

Dmitry Stanishevskii +6
cs.CL 2026-05-14 reviewed

Benchmark suite tests LLMs across eight financial expertise levels
FINESSE-Bench: A Hierarchical Benchmark Suite for Financial Domain Knowledge and Technical Analysis in Large Language Models

Dmitry Stanishevskii +6
cs.CL 2026-05-14 reviewed

RAG pipeline reaches 80% F1 on clinical transcript extraction
Retrieval-Augmented Large Language Models for Schema-Constrained Clinical Information Extraction

A H M Rezaul Karim +1
cs.LG 2026-05-14 reviewed

Open-ended RL boosts LLM reasoning with 46x less data
GRLO: Towards Generalizable Reinforcement Learning in Open-Ended Environments from Zero

Shangjian Yin +3
cs.CL 2026-05-14 reviewed

Reasoning models take different paths
Reasoning Models Don't Just Think Longer, They Move Differently

Anders Gj{\o}lbye +2
cs.CL 2026-05-14 reviewed

Fewer parses increase model surprise in garden paths but not enough
Why are language models less surprised than humans? Testing the Parse Multiplicity Mismatch Hypothesis

William Timkey +2
cs.CL 2026-05-14 reviewed

Math tasks produce highest attention entropy in LLMs
Neural Activation Patterns Across Language Model Architectures: A Comprehensive Analysis of Cognitive Task Performance

Mahdi Naser-Moghadasi +1
cs.CE 2026-05-14 reviewed

Reinforcement updates replace feedback loops in LLM alpha discovery
From Feedback Loops to Policy Updates: Reinforcement Fine-Tuning for LLM-Based Alpha Factor Discovery

Lingzhe Zhang +7
cs.CL 2026-05-14 reviewed

LLM help adapts to user expertise domains to limit over-reliance
Capability Conditioned Scaffolding for Professional Human LLM Collaboration

Sen Yang +1
cs.CL 2026-05-14 reviewed

Ghana AI legal tool handles 32,000 student queries in 30 months
Eskwai for Students: Generative AI Assistant for Legal Education in Ghana

George Boateng +8
cs.CL 2026-05-14 reviewed

WhatsApp AI bot offers science help to West African students
Adesua: Development and Feasibility Study of an AI WhatsApp Bot for Science Learning in West Africa

George Boateng +6
cs.CL 2026-05-14 reviewed

Humans choose words step by step under tight vocabulary limits
Greedy or not, here I come: Language production under vocabulary constraints in humans and resource-rational models

Thomas Hikaru Clark +2