archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 14

cs.CL 2026-05-14 reviewed

Ukrainian court citations form unsupervised legal ontology
Automatic Construction of a Legal Citation Graph from 100 Million Ukrainian Court Decisions: Large-Scale Extraction, Topological Analysis, and Ontology-Driven Clustering

Volodymyr Ovcharov
cs.LG 2026-05-14 reviewed

Agent turns I/O examples into code via guided evolutionary search
From I/O to Code with Discovery Agent

Yihong Dong +9
cs.AI 2026-05-14 reviewed

LaMR prunes code context to save 31% tokens while matching full performance
Context Pruning for Coding Agents via Multi-Rubric Latent Reasoning

Jingjing Wang +8
cs.CL 2026-05-14 reviewed

New tool opens discourse data across 16 languages for local use
DiscoExplorer: An Open Interface for the Study of Multilingual Discourse Relations

Amir Zeldes
cs.RO 2026-05-14 reviewed

Human video builds physical smarts for top robot policies
PhysBrain 1.0 Technical Report

Shijie Lian +12
cs.CL 2026-05-14 reviewed

Natural literary translations often drift from the original meaning
Fluency and Faithfulness in Human and Machine Literary Translation

Sarah Griebel +1
cs.CV 2026-05-14 reviewed

One token unifies agentic and latent visual reasoning
ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

Ziyu Guo +3
cs.LG 2026-05-14 reviewed

FutureSim shows top AI agents predict events at 25% accuracy
FutureSim: Replaying World Events to Evaluate Adaptive Agents

Shashwat Goel +7
cs.CL 2026-05-14 reviewed

Grep beats vector search in most agentic tasks
Is Grep All You Need? How Agent Harnesses Reshape Agentic Search

Sahil Sen +4
cs.CR 2026-05-14 reviewed

Length alone triggers LLM backdoors to leak secrets
MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs

Rui Wen +4
cs.CL 2026-05-14 reviewed

EHR tables sharpen timing in text-based clinical timelines
Text Knows What, Tables Know When: Clinical Timeline Reconstruction via Retrieval-Augmented Multimodal Alignment

Sayantan Kumar +3
cs.CL 2026-05-14 reviewed

Memory model lets LLMs add new knowledge without retraining
MeMo: Memory as a Model

Ryan Wei Heng Quek +8
cs.CL 2026-05-14 reviewed

Memory model lets LLMs add knowledge without retraining
MeMo: Memory as a Model

Ryan Wei Heng Quek +8
cs.CR 2026-05-14 reviewed

The paper builds a 507-leaf taxonomy of LLM inference attacks from 932 recent security…
Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks

Karthik Raghu Iyer +3
cs.CL 2026-05-14 reviewed

Framework converts text tool benchmarks to audio for voice agents
From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents

Md Tahmid Rahman Laskar +5
cs.CL 2026-05-14 reviewed

The paper presents a framework that converts existing text-based tool-calling benchmarks…
From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents

Md Tahmid Rahman Laskar +5
cs.AI 2026-05-14 reviewed

Open framework lifts coding agent to 67.5% on SWE-bench
Orchard: An Open-Source Agentic Modeling Framework

Baolin Peng +13
cs.LG 2026-05-14 reviewed

128 random demos suffice for strong RLVR results
Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance

Kai Yan +2
cs.CL 2026-05-14 reviewed

Window-level RL raises speculative decoding acceptance to 6.5
Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Windowing

Jie Jiang +4
cs.CL 2026-05-14 reviewed

Token counts for Ukrainian legal text differ 1.6 times by model
Tokenizer Fertility and Zero-Shot Performance of Foundation Models on Ukrainian Legal Text: A Comparative Study

Volodymyr Ovcharov
cs.AI 2026-05-14 reviewed

Decomposing traces boosts AI agent diagnosis accuracy up to 12x
Holistic Evaluation and Failure Diagnosis of AI Agents

Netta Madvil +14
cs.CV 2026-05-14 reviewed

CIR benchmarks let models solve most queries with one modality
Do Composed Image Retrieval Benchmarks Require Multimodal Composition?

Matteo Attimonelli +10
cs.AI 2026-05-14 reviewed

Graph paths verify legal reasoning in Indian court AI
Falkor-IRAC: Graph-Constrained Generation for Verified Legal Reasoning in Indian Judicial AI

Joy Bose
cs.CV 2026-05-14 reviewed

Internal masking cuts hallucinations in vision-language models
Do We Really Need External Tools to Mitigate Hallucinations? SIRA: Shared-Prefix Internal Reconstruction of Attribution

Tian Qin +5
cs.CL 2026-05-14 reviewed

Terminal anchors extend LLM context to 64K from short sequences
EndPrompt: Efficient Long-Context Extension via Terminal Anchoring

Han Tian +12
cs.CL 2026-05-14 reviewed

Denoising paths supply low-cost uncertainty scores for language diffusion models
Uncertainty Quantification for Large Language Diffusion Models

Artem Vazhentsev +5
cs.SE 2026-05-14 reviewed

ML classifier beats rules at spotting BDD refactoring chances
Mining Subscenario Refactoring Opportunities in Behaviour-Driven Software Test Suites: ML Classifiers and LLM-Judge Baselines

Ali Hassaan Mughal +2
cs.SE 2026-05-14 reviewed

Memory agent keeps repo documentation consistent
Remember Your Trace: Memory-Guided Long-Horizon Agentic Framework for Consistent and Hierarchical Repository-Level Code Documentation

Suyoung Bae +4
cs.LG 2026-05-14 reviewed

Action tokens carry the training signal in agentic RL
Resolving Action Bottleneck: Agentic Reinforcement Learning Informed by Token-Level Energy

Langzhou He +9
cs.CL 2026-05-14 reviewed

CIPO turns LLM failures into better reasoning
Learning from Failures: Correction-Oriented Policy Optimization with Verifiable Rewards

Mengjie Ren +8
cs.CL 2026-05-14 reviewed

Optimal control view yields language models with both fidelity and parallel speed
Language Generation as Optimal Control: Closed-Loop Diffusion in Latent Control Space

ZiYi Dong +5
cs.CL 2026-05-14 reviewed

Optimal control reformulation gives language models fast parallel sampling at high quality
Language Generation as Optimal Control: Closed-Loop Diffusion in Latent Control Space

ZiYi Dong +5
cs.CL 2026-05-14 reviewed

Many perfect LLM scores hide dimensional intent failures
Dimension-Level Intent Fidelity Evaluation for Large Language Models: Evidence from Structured Prompt Ablation

Gang Peng
cs.CL 2026-05-14 reviewed

LLM memory systems hit only 46% on group conversations
GroupMemBench: Benchmarking LLM Agent Memory in Multi-Party Conversations

Jingbo Yang +5
cs.CL 2026-05-14 reviewed

Group chats expose limits of LLM agent memory
GroupMemBench: Benchmarking LLM Agent Memory in Multi-Party Conversations

Jingbo Yang +5
cs.CL 2026-05-14 reviewed

Ming glossaries used flexible Chinese characters to approximate foreign sounds
Cross-Linguistic Transcription and Phonological Representation in the Hu\`it\'onggu\v{a}nx\`i Hu\'ay\'iy\`iy\v{u}

Ji-eun Kim
cs.SE 2026-05-14 reviewed

Stale code snippets make models output outdated helpers
When Retrieval Hurts Code Completion: A Diagnostic Study of Stale Repository Context

Haojun Weng +4
cs.CL 2026-05-14 reviewed

RAG follows conflicting context over its own knowledge
Does RAG Know When Retrieval Is Wrong? Diagnosing Context Compliance under Knowledge Conflict

Yihang Chen +6
cs.CL 2026-05-14 reviewed

Probe shows RAG follows wrong context in 85 percent of conflict cases
Does RAG Know When Retrieval Is Wrong? Diagnosing Context Compliance under Knowledge Conflict

Yihang Chen +6
cs.LG 2026-05-14 reviewed

Guardrails adapt from sparse noisy failures via conservative induction
LiSA: Lifelong Safety Adaptation via Conservative Policy Induction

Minbeom Kim +8
cs.LG 2026-05-14 reviewed

Orthogonal projection isolates hallucination signals in LLM answers
When Answers Stray from Questions: Hallucination Detection via Question-Answer Orthogonal Decomposition

Siyang Yao +2
cs.CV 2026-05-14 reviewed

Adaptive gate skips reasoning for simple multimodal inputs
Think When Needed: Adaptive Reasoning-Driven Multimodal Embeddings with a Dual-LoRA Architecture

Longxiang Zhang +4
cs.CL 2026-05-14 reviewed

Calculus finds optimal vocabulary size for ASR
A Calculus-Based Framework for Determining Vocabulary Size in End-to-End ASR

Sunil Kumar Kopparapu
cs.SE 2026-05-14 reviewed

Agents resolve 45 percent of chained package upgrades
SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades

Man Ho Lam +7
cs.CL 2026-05-14 reviewed

New scores track whether unlearning works across languages
Knowledge Beyond Language: Bridging the Gap in Multilingual Machine Unlearning Evaluation

Kyomin Hwang +3
cs.CL 2026-05-14 reviewed

Three-tier memory lifts recommender hit rate by 26 percent
Agentic Recommender System with Hierarchical Belief-State Memory

Xiang Shen +10
cs.CL 2026-05-14 reviewed

Three-tier memory raises recommender hit rate 26 percent
Agentic Recommender System with Hierarchical Belief-State Memory

Xiang Shen +10
cs.LG 2026-05-14 reviewed

Synthetic queries expose five times more LLM failures
NodeSynth: Socially Aligned Synthetic Data for AI Evaluation

Qazi Mamunur Rashid +7
cs.LG 2026-05-14 reviewed

Synthetic queries trigger up to 5x higher LLM failure rates
NodeSynth: Socially Aligned Synthetic Data for AI Evaluation

Qazi Mamunur Rashid +7
cs.CL 2026-05-14 reviewed

Synthetic augmentation lifts defense classification to 58% accuracy
Mitigating Data Scarcity in Psychological Defense Classification with Context-Aware Synthetic Augmentation

Hoang-Thuy-Duong Vu +2