archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 10

cs.AI 2026-05-18 reviewed

Entropy-gradient inversion marks stronger reasoning in LRMs
Entropy-Gradient Inversion: Moving Toward Internal Mechanism of Large Reasoning Models

Junyao Yang +6
cs.AI 2026-05-18 reviewed

Entropy-gradient inversion marks stronger reasoning in large models
Entropy-Gradient Inversion: Moving Toward Internal Mechanism of Large Reasoning Models

Junyao Yang +6
cs.CL 2026-05-18 reviewed

Mixing ICD-9 and ICD-10 data lifts rare code F1 by 27 percent
Bridging the Version Gap: Multi-version Training Improves ICD Code Prediction, Especially for Rare Codes

Jinghui Liu +1
cs.CL 2026-05-18 reviewed

Topic models assign themes to segments
From Documents to Segments: A Contextual Reformulation for Topic Assignment

Hoonsang Yoon +5
cs.CL 2026-05-18 reviewed

Distillation cuts error rates for Nigerian speech recognition by 29%
Sometin Beta Pass Notin (SBPN): Improving Multilingual ASR for Nigerian Languages via Knowledge Distillation

Sewade Ogun
cs.LG 2026-05-18 reviewed

Persistent margins, not drifts, carry safety signals across LLM layers
Geometry-Lite: Interpretable Safety Probing via Layer-Wise Margin Geometry

Woo Seob Sim +1
cs.CL 2026-05-17 reviewed

LLMs mirror human power biases in simulated talks
Do LLM Agents Mirror Socio-Cognitive Effects in Power-Asymmetric Conversations?

Anvesh Rao Vijjini +2
cs.CL 2026-05-17 reviewed

LLMs copy human power dynamics in role-play dialogues
Do LLM Agents Mirror Socio-Cognitive Effects in Power-Asymmetric Conversations?

Anvesh Rao Vijjini +2
cs.CL 2026-05-17 reviewed

Gemini leads LLM benchmark on legal precedent classification
Validate Your Authority: Benchmarking LLMs on Multi-Label Precedent Treatment Classification

M. Mikail Demir +1
cs.CL 2026-05-17 reviewed

Reasoning models cut 26% tokens by exiting at semantic convergence
Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models

Dehai Min +5
cs.CL 2026-05-17 reviewed

Peer editing with audio matches speech summary quality to transcripts
Beyond Transcripts: Iterative Peer-Editing with Audio Unlocks High-Quality Human Summaries of Conversational Speech

Kaavya Chaparala +5
cs.AI 2026-05-17 reviewed

Causal tests select better memories for long-running AI agents
Causal Intervention-Based Memory Selection for Long-Horizon LLM Agents

Saksham Sahai Srivastava
cs.CL 2026-05-17 reviewed

Co-citation predictability drops over 20 years
Temporal Decay of Co-Citation Predictability: A 20-Year Statute Retrieval Benchmark from 396M Ukrainian Court Citations

Volodymyr Ovcharov
cs.CR 2026-05-17 reviewed

Adversary can always reframe prompt injections as legitimate
AI Agents May Always Fall for Prompt Injections

Sahar Abdelnabi +1
cs.CV 2026-05-17 reviewed

Fast-slow video guardrail tops larger models at lower cost
SafeLens: Deliberate and Efficient Video Guardrails with Fast-and-Slow Screening

Shahriar Kabir Nahin +3
cs.CL 2026-05-17 reviewed

MoE models show deep-layer routing collapse for low-resource languages
Mixture of Experts for Low-Resource LLMs

Ori Bar Joseph +4
cs.LG 2026-05-17 reviewed

Mu-GRPO halves LLM RL wall-clock time with stale rollouts
How Off-Policy Can GRPO Be? Mu-GRPO for Efficient LLM Reinforcement Learning

Minghao Tian +2
cs.AI 2026-05-17 reviewed

Small chess model tops larger ones via pattern matching
Generalization or Memorization? Brittleness Testing for Chess-Trained Language Models

Ethan Tang
cs.SE 2026-05-17 reviewed

Inverted API exploration yields verified tool-call data
Firefly: Illuminating Large-Scale Verified Tool-Call Data Generation from Real APIs

Yuxuan Lu +14
cs.LG 2026-05-17 reviewed

CausalSynth makes LLM synthetic data obey causal graphs
CasualSynth: Generating Structurally Sound Synthetic Data

Zehua Cheng +3
cs.AI 2026-05-17 reviewed

EEG-to-text pipeline beats random baseline by 30 percent
RAG-based EEG-to-Text Translation Using Deep Learning and LLMs

Enrico Collautti +4
cs.CL 2026-05-17 reviewed

Decomposition separates context anchors in ambiguous word embeddings
RSD: A Local Triangulation Audit Primitive for Learned Vector Blocks

Seungmin Jin
cs.CL 2026-05-17 reviewed

Hybrid features raise CNN recall for Bangla fake news
Hybrid Feature Combinations with CNN for Bangla Fake News Classification

Md Gulzar Hussain +2
cs.CL 2026-05-17 reviewed

Verifying hypotheses attributes failures better in multi-agent LLMs
VerifyMAS: Hypothesis Verification for Failure Attribution in LLM Multi-Agent Systems

Hezhe Qiao +4
cs.CR 2026-05-17 reviewed

Tool-using AI agents can be poisoned after trust is built
Trust No Tool: Evaluating and Defending LLM Agents under Untrusted Tool Feedback

Lecheng Yan +7
cs.SE 2026-05-17 reviewed

ContraFix fixes 84% of C/C++ vulnerabilities at low cost
ContraFix: Agentic Vulnerability Repair via Differential Runtime Evidence and Skill Reuse

Simiao Liu +4
cs.GR 2026-05-17 reviewed

FEA feedback lifts CAD agents past 20 percent requirement compliance
Self-Improving CAD Generation Agents with Finite Element Analysis as Feedback

Guijin Son +4
cs.CV 2026-05-17 reviewed

Dynamic fixation keeps 98% OCR accuracy with 5% visual tokens
FastOCR: Dynamic Visual Fixation via KV Cache Pruning for Efficient Document Parsing

Zihan Tang +9
cs.LG 2026-05-17 reviewed

DiDi-Merging matches baselines at 1.24x single-model size
Dynamic Model Merging Made Slim

Guodong Du +1
cs.SE 2026-05-17 reviewed

Memory layers raise repo vulnerability repair to 58%
MemRepair: Hierarchical Memory for Agentic Repository-Level Vulnerability Repair

Simiao Liu +5
cs.CL 2026-05-17 reviewed

ASR errors degrade Korean QA the same relative amount across LLMs
Analyzing Error Propagation in Korean Spoken QA with ASR-LLM Cascades

Donghyuk Jung +1
cs.CL 2026-05-17 reviewed

Catalogues miss 609 datasets across 53 languages
Beyond Catalogue Counts: the Dataset Visibility Asymmetry in Low-Resource Multilingual NLP

Zhiyin Tan +1
cs.CV 2026-05-17 reviewed

Text overrides images in clinical vision models
Medical Context Distorts Decisions in Clinical Vision Language Models

David Restrepo +4
cs.CL 2026-05-17 reviewed

Structured evidence fusion improves biomedical QA across LLMs
BELIEF: Structured Evidence Modeling and Uncertainty-Aware Fusion for Biomedical Question Answering

Chang Zong +4
cs.CL 2026-05-17 reviewed

MiniGPT hits 1.478 loss and Shakespeare dialogue
MiniGPT: Rebuilding GPT from First Principles

Jibin Joseph
cs.AI 2026-05-17 reviewed

Small expert annotations calibrate LLMs to match human judgments on generative AI
QQJ: Quantifying Qualitative Judgment for Scalable and Human-Aligned Evaluation of Generative AI

Marjan Veysi +3
cs.CL 2026-05-17 reviewed

Domain token swaps reduce training time 35-55% for LLM summarization
Learning Faster with Better Tokens: Parameter-Efficient Vocabulary Adaptation for Specialized Text Summarization

Gunjan Balde +3
cs.CL 2026-05-17 reviewed

Five agents map news bias by exposing omissions and manipulations
NewsLens: A Multi-Agent Framework for Adversarial News Bias Navigation

Joy Bose
cs.CL 2026-05-17 reviewed

Offline priors initialize better multi-agent LLM graphs
Learning Transferable Topology Priors for Multi-Agent LLM Collaboration Across Domains

Taolin Zhang +6
cs.AI 2026-05-17 reviewed

Hypergraph links text levels for stronger personality prediction
HyperPersona: A Multi-Level Hypergraph Framework for Text-Based Automatic Personality Prediction

Sina Heydari +1
cs.CL 2026-05-17 reviewed

Multi-agent alignment lifts factual accuracy on knowledge QA
AMATA: Adaptive Multi-Agent Trajectory Alignment for Knowledge-Intensive Question Answering

Taolin Zhang +7
cs.CL 2026-05-17 reviewed

State transitions keep recovering agents alive in LLM teams
Taming "Zombie'' Agents: A Markov State-Aware Framework for Resilient Multi-Agent Evolution

Taolin Zhang +7
cs.CL 2026-05-17 reviewed

Decomposition separates cyclic preferences for better LLM alignment
Transitivity Meets Cyclicity: Explicit Preference Decomposition for Dynamic Large Language Model Alignment

Yucong Huang +3
cs.CL 2026-05-17 reviewed

Mismatched wrong drafts boost GRPO math performance
Weak-to-Strong Elicitation via Mismatched Wrong Drafts

Wei Deng
cs.AI 2026-05-17 reviewed

Control loop raises LLM self-correction accuracy by 6.2 points
CyberCorrect: A Cybernetic Framework for Closed-Loop Self-Correction in Large Language Models

Yuning Wu +2
cs.LG 2026-05-17 reviewed

Context Codec verifies which commitments survive LLM context compression
Compress the Context, Keep the Commitments: A Formal Framework for Verifiable LLM Context Compression

Natalia Trukhina +1
cs.CL 2026-05-17 reviewed

ConflictRAG resolves document conflicts to raise RAG accuracy
ConflictRAG: Detecting and Resolving Knowledge Conflicts in Retrieval Augmented Generation

Chenyu Wang +3
cs.LG 2026-05-17 reviewed

Offline sampling freezes partition function before LLM-RL policy updates
DISA: Offline Importance Sampling for Distribution-Matching LLM-RL

Shaobo Wang +11
cs.CL 2026-05-17 reviewed

Agentic training loop lifts Lean prover to record Pass@32 scores
OProver: A Unified Framework for Agentic Formal Theorem Proving

David Ma +9
cs.LG 2026-05-17 reviewed

Pullback Fisher metric gives closed-form optimal activation steering
FishBack: Pullback Fisher Geometry for Optimal Activation Steering in Transformers

Sihan Wang +1