archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 9

cs.CL 2026-05-18 reviewed

Decoupling tool use from execution boosts LLM math reasoning
Implicit Hierarchical GRPO: Decoupling Tool Invocation from Execution for Tool-Integrated Mathematical Reasoning

Li Wang +7
cs.CL 2026-05-18 reviewed

Wiki beats RAG on cross-paper links but costs more tokens
Vector RAG vs LLM-Compiled Wiki: A Preregistered Comparison on a Small Multi-Domain Research

Theodore O. Cochran
cs.CR 2026-05-18 reviewed

Generator turns text prompts into LLM fingerprints in one pass
Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation

Sixu Chen +7
cs.CL 2026-05-18 reviewed

BERT and T5 differ in NER performance by tag scheme
From BERT to T5: A Study of Named Entity Recognition

Mei Jia
cs.CV 2026-05-18 reviewed

Accuracy unchanged when latent visual tokens replaced by dummies
What's Holding Back Latent Visual Reasoning?

Andr\'e G. Viveiros +3
cs.CL 2026-05-18 reviewed

No memory method works consistently for LLM agents
EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective

Yuyao Wang +9
cs.CL 2026-05-18 reviewed

Governed skill libraries boost frozen agents on benchmarks
SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution

Hongyi Liu +5
cs.CL 2026-05-18 reviewed

LLMs match human conditional ratings without pragmatic reasoning
Presupposition and Reasoning in Conditionals: A Theory-Based Study of Humans and LLMs

Tara Azin +3
cs.CL 2026-05-18 reviewed

Index lets researchers search 1.35 billion news articles in under a second
Infini-News: Efficiently Queryable Access to 1.3 Billion Processed Common Crawl News Articles

Ruggero Marino Lazzaroni +2
cs.AI 2026-05-18 reviewed

Self-distillation supplies step-level search signals from own rollouts
SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning

Yufei Ma +8
cs.CL 2026-05-18 reviewed

Preference focus cuts device RAG memory 2400 times
From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG

Changmin Lee +2
cs.CL 2026-05-18 reviewed

K2V extends RLVR to knowledge domains via process verification
Knowledge-to-Verification: Exploring RLVR for LLMs in Knowledge-Intensive Domains

Zhonghang Yuan +9
cs.CV 2026-05-18 reviewed

Shared codebook bridges modalities without full data pairs
CodeBind: Decoupled Representation Learning for Multimodal Alignment with Unified Compositional Codebook

Zeyu Chen +2
cs.CL 2026-05-18 reviewed

MDU unlearns data in masked diffusion models by KL reversal
Machine Unlearning for Masked Diffusion Language Models

Georu Lee +4
cs.CL 2026-05-18 reviewed

Multi-turn chats in low-resource languages jailbreak LLMs
Multilingual jailbreaking of LLMs using low-resource languages

Dylan Marx +1
cs.CL 2026-05-18 reviewed

SomaliWeb v1 delivers 303M tokens of cleaned Somali text
SomaliWeb v1: A Quality-Filtered Somali Web Corpus with a Matched Tokenizer and a Public Language-Identification Benchmark

Khalid Yusuf Dahir
cs.CL 2026-05-18 reviewed

Memory of precomputed states cuts LLM prefix attention costs
Context Memorization for Efficient Long Context Generation

Yasuyuki Okoshi +5
cs.SD 2026-05-18 reviewed

Speech audio accelerates MRI reconstruction of vocal tracts
SIREM: Speech-Informed MRI Reconstruction with Learned Sampling

Md Hasan +8
cs.CL 2026-05-18 reviewed

GA-S2S adds k-hop graph structure to raise link prediction 19%
Leveraging Graph Structure in Seq2Seq Models for Knowledge Graph Link Prediction

Luu Huu Phuc +5
cs.AI 2026-05-18 reviewed

Varying environment rules builds agents that generalize
Scalable Environments Drive Generalizable Agents

Jiayi Zhang +9
cs.AI 2026-05-18 reviewed

One universal fix reduces hallucinations in 15 models
TRACE: Trajectory Correction from Cross-layer Evidence for Hallucination Reduction

Tej Sanibh Ranade
cs.CL 2026-05-18 reviewed

Hybrid system generates natural sentences from nested logic
FOL2NS: Generating Natural Sentences from First-Order Logic

Mei Jia
cs.CL 2026-05-18 reviewed

Explanation guidelines lift LLM prompt accuracy by 35 percent
iPOE: Interpretable Prompt Optimization via Explanations

Jiahui Li +3
cs.CL 2026-05-18 reviewed

Bangla medical questions trip up top AI models
How Good LLMs Are at Answering Bangla Medical Visual Questions? Dataset and Benchmarking

Rafid Ahmed +5
cs.CL 2026-05-18 reviewed

German news overreports European landslides vs risk data
How Loud Rumbles Hit Newsstands: A Data Analysis of Coverage and Spatial Bias in German News about Landslides Around the World

Brielen Madureira +3
cs.CL 2026-05-18 reviewed

Grafting MoE-expanded deltas adds languages to LLMs efficiently
A Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM$\Delta$ Integration into Upcycled MoE

Hao Zhou +8
cs.LG 2026-05-18 reviewed

Low-precision softmax transformers simulate Turing machines via CoT
The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-Thought

Moritz Br\"osamle +1
cs.CL 2026-05-18 reviewed

KVDrive lifts long-context LLM speed 1.74x with SSD tier
KVDrive: A Holistic Multi-Tier KV Cache Management System for Long-Context LLM Inference

Jian Lin +7
cs.CL 2026-05-18 reviewed

P2P edge agents boost LLM task accuracy 8% and reduce latency 16%
PPAI: Enabling Personalized LLM Agent Interoperability for Collaborative Edge Intelligence

Zile Wang +6
cs.LG 2026-05-18 reviewed

Boundary protection recovers 69-90% quality at 13% KV retention
Protection Is (Nearly) All You Need: Structural Protection Dominates Scoring in Globally Capped KV Eviction

Gabriel Garcia
cs.CL 2026-05-18 reviewed

Tool localizes node errors in multi-agent LLM workflows
PROTEA: Offline Evaluation and Iterative Refinement for Multi-Agent LLM Workflows

Kazuki Kawamura +2
cs.CL 2026-05-18 reviewed

Reranking by label semantics lifts hard-case F1 by over 9 points
Semantic Reranking at Inference Time for Hard Examples in Rhetorical Role Labeling

Anas Belfathi +4
cs.CL 2026-05-18 reviewed

Neural tweaks make read speech sound like real conversation
Bridging the Gap: Converting Read Text to Conversational Dialogue

Parshav Singla +7
cs.CL 2026-05-18 reviewed

Predictive prefetching cuts RAG latency up to 43.5%
Predictive Prefetching for Retrieval-Augmented Generation

Wuyang Zhang +1
cs.CL 2026-05-18 reviewed

LLM generates explicit vectorized code beating compiler -O3
AutoVecCoder: Teaching LLMs to Generate Explicitly Vectorized Code

Shangzhan Li +10
cs.CL 2026-05-18 reviewed

BacktestBench tests LLMs on 18k backtesting QA pairs from real markets
BacktestBench: Benchmarking Large Language Models for Automated Quantitative Strategy Backtesting

Zhensheng Wang +5
cs.CL 2026-05-18 reviewed

Natural triggers drop sentiment accuracy to 0.04
Universal Adversarial Triggers

Benedict Florance Arockiaraj +3
cs.CL 2026-05-18 reviewed

Prompt compression fails to transfer to diffusion LLMs
Prompt Compression in Diffusion Large Language Models: Evaluating LLMLingua-2 on LLaDA

Sterling Huang +6
cs.LG 2026-05-18 reviewed

Transient expert steers MoE updates to cut forgetting
CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning

Yang Liu +2
cs.CL 2026-05-18 reviewed

Benchmark turns NASA mission text into logic formulas
A Pilot Benchmark for NL-to-FOL Translation in Planetary Exploration

Hayden Moore +2
cs.AI 2026-05-18 reviewed

AI chunking builds maps predicting war in Thucydides model
Agentic Chunking and Bayesian De-chunking of AI Generated Fuzzy Cognitive Maps: A Model of the Thucydides Trap

Akash Kumar Panda +2
cs.CL 2026-05-18 reviewed

AI agent teams beat human teams at generating creative ideas
Multi-agent AI systems outperform human teams in creativity

Tiancheng Hu +7
cs.LG 2026-05-18 reviewed

Hindsight targets fix actions to cut agent training time 2.26x
HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents

Woongyeng Yeo +3
cs.CL 2026-05-18 reviewed

New multi-accent dataset lowers ASR errors on technical talks
PAREDA: A Multi-Accent Speech Dataset of Natural Language Processing Research Discussions

Sicheng Jin +2
cs.CL 2026-05-18 reviewed

SynPro yields 3.7-5.2x more effective tokens from organic data
Generating Pretraining Tokens from Organic Data for Data-Bound Scaling

Zichun Yu +1
cs.LO 2026-05-18 reviewed

Retrieval system compresses Lean proofs over 70 percent
Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search

Jialin Lu +6
cs.AI 2026-05-18 reviewed

Memory-equipped agents show rising safety risks over time
Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents

Ahmad Al-Tawaha +4
cs.CL 2026-05-18 reviewed

Memory systems score 0.12-0.18 on social group benchmark
SocialMemBench: Are AI Memory Systems Ready for Social Group Settings?

Olukunle Owolabi
cs.CL 2026-05-18 reviewed

LLM-rephrased notes keep broad utility but lose ICD details
Systematic Evaluation of the Quality of Synthetic Clinical Notes Rephrased by LLMs at Million-Note Scale

Jinghui Liu +2
cs.CL 2026-05-18 reviewed

Fine-tuned small models plan with tools without any catalog in the prompt
Internalizing Tool Knowledge in Small Language Models via QLoRA Fine-Tuning

Yuval Shemla +4