archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 7

cs.CL 2026-05-19 reviewed

LLM use adds complex words and syntax to NLP papers
What Are LLMs Doing to Scientific Communication? Measuring Changes in Writing Practices and Reading Experience

Filip Mileti\'c +1
cs.AI 2026-05-19 reviewed

Context map cache raises LLM agent accuracy 6-34% on recurring tasks
PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents

Zhuohan Gu +3
cs.CL 2026-05-19 reviewed

Scorer choice sets the layer where authorship signals consolidate
Where Does Authorship Signal Emerge in Encoder-Based Language Models?

Francis Kulumba +3
cs.CL 2026-05-19 reviewed

Model learns when to skip tools for better multimodal answers
Are Tools Always Beneficial? Learning to Invoke Tools Adaptively for Dual-Mode Multimodal LLM Reasoning

Qinghe Ma +5
cs.CL 2026-05-19 reviewed

Influence functions fix model errors via key sample and concept tweaks
CLIF: Concept-Level Influence Functions for Transparent Bottleneck Models

Yike Sun +7
cs.CV 2026-05-19 reviewed

Dense benchmark exposes open VLMs' gaps on subtle human actions
FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

Gueter Josmy Faure +4
cs.CV 2026-05-19 reviewed

Open VLMs struggle with fine details in human video actions
FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

Gueter Josmy Faure +4
cs.CV 2026-05-19 reviewed

Dual-stream network lifts weather detection at full speed
CADENet: Condition-Adaptive Asynchronous Dual-Stream Enhancement Network for Adverse Weather Perception in Autonomous Driving

Sherif Khairy +1
cs.SD 2026-05-19 reviewed

Scaled simulations cut speech recognition errors over 30 percent
Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation

Zhifei Xie +6
cs.AI 2026-05-19 reviewed

Temporal conditioning changes AV planner style but not scores
From Prompts to Pavement Through Time: Temporal Grounding in Agentic Scene-to-Plan Reasoning

Ahmed Y. Gado +4
cs.CL 2026-05-19 reviewed

Rubric shows LLMs generate mostly high-quality legal propositions
LP-Eval: Rubric and Dataset for Measuring the Quality of Legal Proposition Generation

Shanshan Xu +4
cs.CL 2026-05-19 reviewed

Section-based chunking tops recall in German legal retrieval
Chunking German Legal Code

Max Prior +2
cs.CL 2026-05-19 reviewed

LLMs generate coherent multimodal behaviors for ability and benevolence
Towards Trust Calibration in Socially Interactive Agents: Investigating Gendered Multimodal Behaviors Generation with LLMs

Lucie Galland +2
cs.CL 2026-05-19 reviewed

Long-term medical dialogue benchmark reveals LLM limitations
Synthesis and Evaluation of Long-term History-aware Medical Dialogue

Hebin Hu +3
cs.AI 2026-05-19 reviewed

Pure code boosts programming but hurts complex math reasoning
What Really Improves Mathematical Reasoning: Structured Reasoning Signals Beyond Pure Code

Yuze Zhao +8
cs.CL 2026-05-19 reviewed

Node topology turned into text improves graph anomaly detection
TERGAD: Structure-Aware Text-Enhanced Representations for Graph Anomaly Detection

Wen Shi +8
cs.CL 2026-05-19 reviewed

Fuzzy concept graph cuts RAG indexing to 30 LLM calls
ContextRAG: Extraction-Free Hierarchical Graph Construction for Retrieval-Augmented Generation

Roman Prosvirnin +2
cs.CL 2026-05-19 reviewed

Review of 120 studies maps LLM math reasoning gaps
Mathematical Reasoning in Large Language Models: Benchmarks, Architectures, Evaluation, and Open Challenges

Husnain Amjad +3
cs.CL 2026-05-19 reviewed

Parser trained on CHILDES beats general tools on child speech
CAIT: A Syntactic Parsing Toolkit for Child-Adult InTeractions

Francesca Padovani +6
cs.CL 2026-05-19 reviewed

84K Arabic samples built for Saudi financial sentiment analysis
LLM-Based Financial Sentiment Analysis in Arabic: Evidence from Saudi Markets

Mona H. Albaqawi +3
cs.CL 2026-05-19 reviewed

LLMs fix West Frisian ASR errors on unseen texts
Can Large Language Models Reliably Correct Errors in Low-Resource ASR? A Contamination-Aware Case Study on West Frisian

Yun Hao +4

3 Piths
cs.LG 2026-05-19 reviewed

OScaR reaches near-lossless INT2 KV cache quantization
OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond

Zunhai Su +13
cs.CL 2026-05-19 reviewed

2-bit LLMs retain most accuracy on reasoning tasks
K-Quantization and its Impact on Output Performance

Robin Baki Davidsson +1
cs.CL 2026-05-19 reviewed

One LLM system optimizes text to beat specialists on six tasks
optimize_anything: A Universal API for Optimizing any Text Parameter

Lakshya A Agrawal +13
cs.CL 2026-05-19 reviewed

New Chinese benchmark caps LLM logical accuracy at 37.5 percent
LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening

Ming Zhang +15
cs.CL 2026-05-19 reviewed

Open dataset and reweighting match big models in long-context RL
GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment

Minxuan Lv +11
cs.AI 2026-05-19 reviewed

Governance recipe lifts LLM skill-library performance from 0.26 to 0.58
Library Drift: Diagnosing and Fixing a Silent Failure Mode in Self-Evolving LLM Skill Libraries

Xing Zhang +6
cs.CL 2026-05-19 reviewed

No multi-word expression is absolutely idiomatic
A Data-Driven Approach to Idiomaticity Based on Experts' Criteria in Theoretical Linguistics

Elena Mikhalkova +5
cs.CL 2026-05-19 reviewed

One model serves many embedding sizes in retrieval
m3BERT: A Modern, Multi-lingual, Matryoshka Bidirectional Encoder

Yaoxiang Wang +6
cs.CL 2026-05-19 reviewed

Merging LLMs into VLMs boosts instructions but not math
Investigating Cross-Modal Skill Injection: Scenarios, Methods, and Hyperparameters

Zhiyu Xu +7
cs.CL 2026-05-19 reviewed

Base models fool AI detectors into rating text as human
Base Models Look Human To AI Detectors

Yixuan Even Xu +4
cs.AI 2026-05-19 reviewed

Context management determines real-world Transformer Turing-completeness
Position: The Turing-Completeness of Autoregressive Transformers Relies Heavily on Context Management

Guanyu Cui +2
cs.CL 2026-05-19 reviewed

TokenDrift cuts Gen-PPL by 89% at 4 steps in DDLMs
Drifting Objectives for Refining Discrete Diffusion Language Models

Daisuke Oba +2
cs.LG 2026-05-19 reviewed

CEPO boosts math reasoning to 43.43% at 2B and 60.56% at 4B
CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization

Ahmed Heakl +6
cs.CL 2026-05-19 reviewed

Backtracking fixes dual biases in LLM reasoning distillation
Backtracking When It Strays: Mitigating Dual Exposure Biases in LLM Reasoning Distillation

Bing Wang +9
cs.CL 2026-05-19 reviewed

Pairwise confidence weights sharpen LLM policy optimization
LambdaPO: A Lambda Style Policy Optimization for Reasoning Language Models

Redacted by arXiv +6
cs.CL 2026-05-19 reviewed

Pairwise sums replace group means in LLM policy optimization
LambdaPO: A Lambda Style Policy Optimization for Reasoning Language Models

Redacted by arXiv +6
cs.CL 2026-05-19 reviewed

Reassembling entity pairs boosts synthetic QA accuracy by 88.9%
EmbGen: Teaching with Reassembled Corpora

Arun K Lenin +3
cs.CL 2026-05-19 reviewed

Entropy shaping makes LLMs concise on easy math and deeper on hard ones
Taming the Thinker: Conditional Entropy Shaping for Adaptive LLM Reasoning

Shuyu Wei +8
cs.CL 2026-05-19 reviewed

Framework creates custom science benchmarks for LLMs from existing data
SciCustom: A Framework for Custom Evaluation of Scientific Capabilities in Large Language Models

Yiyang Gu +17
cs.MA 2026-05-19 reviewed

Architecture lets AI agents break rules legitimately when justified
PAVE: A Cognitive Architecture for Legitimate Violation in Generative Agent Societies

Ahmad Yehia +6
cs.CL 2026-05-19 reviewed

Supreme Court quashes 18 points more matrimonial petitions than Karnataka HC
IMLJD: A Computational Dataset for Indian Matrimonial Litigation Analysis

Joy Bose
cs.CL 2026-05-19 reviewed

Retrieval rewriting lifts LLM calibration up to 58%
Retrieval-Augmented Linguistic Calibration

Yi-Fan Yeh +6
cs.CL 2026-05-19 reviewed

Benchmark labels hallucinations via explicit reference worlds
HalluWorld: A Controlled Benchmark for Hallucination via Reference World Models

Emmy Liu +6

5 Piths
cs.MA 2026-05-19 reviewed

STAR-PólyaMath hits perfect scores on Putnam and IMO
STAR-P\'olyaMath: Multi-Agent Reasoning under Persistent Meta-Strategic Supervision

Jiaao Wu +5
cs.GT 2026-05-19 reviewed

LLMs close 99% of deals but earn low profits in hidden pricing
PrefBench: Evaluating Zero-Shot LLM Agents in Hidden-Preference Personalized Pricing Negotiations

Yingjie Lei
cs.CL 2026-05-19 reviewed

Multi-agent evaluators lock reading items to target difficulty
A Multi-Agent Framework for Feature-Constrained Difficulty Control in Reading Comprehension Item Generation

Seonjeong Hwang +3
cs.CL 2026-05-19 reviewed

Small targeted probes break document parsers as much as large ones
How Do Document Parsers Break? Auditing Structural Vulnerability in Document Intelligence

Yue Chen +4
cs.CL 2026-05-19 reviewed

Metric selects only necessary rationales for LLM misinformation checks
Are Rationales Necessary and Sufficient? Tuning LLMs for Explainable Misinformation Detection

Bing Wang +8
cs.CL 2026-05-19 reviewed

LLMs learn redundant copies of concepts across languages
Language models struggle with compartmentalization

Thomas Vincent Howe +1