archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 17

cs.CL 2026-05-13 reviewed

Language-specific thresholds lift slur detection F1 by 2-5%
KIT-TIP-NLP at MultiPride: Continual Learning with Multilingual Foundation Model

Barathi Ganesh HB +3
cs.CL 2026-05-13 reviewed

LLMs annotate asylum credibility with inconsistent errors
LLMs as annotators of credibility assessment in Danish asylum decisions: evaluating classification performance and errors beyond aggregated metrics

Galadrielle Humblot-Renaux +9
cs.CR 2026-05-13 reviewed

External skill library keeps LLM attacks evolving after saturation
Model-Agnostic Lifelong LLM Safety via Externalized Attack-Defense Co-Evolution

Xiaozhe Zhang +6
cs.CL 2026-05-13 reviewed

Puzzles reveal all-or-nothing success for humans and LLMs
From Rosetta to Match-Up: A Paired Corpus of Linguistic Puzzles with Human and LLM Benchmarks

Neh Majmudar +3
cs.LO 2026-05-13 reviewed

Certificates verify LLM pipelines by auditing only deterministic parts
Proof-Carrying Certificates for LLM Pipelines: A Trust-Boundary Architecture

George Koomullil
cs.CL 2026-05-13 reviewed

Fine-tuned BART and T5 parsers beat prior seq2seq models on constituent parsing
Exploiting Pre-trained Encoder-Decoder Transformers for Sequence-to-Sequence Constituent Parsing

Daniel Fern\'andez-Gonz\'alez +1
cs.LG 2026-05-13 reviewed

Phase rotations on unit circle stabilize explicit memory
Phasor Memory Networks: Stable Backpropagation Through Time for Scalable Explicit Memory

Sungwoo Goo +2
cs.CL 2026-05-13 reviewed

LLMs self-train on examples generated from the query alone
Query-Conditioned Test-Time Self-Training for Large Language Models

Chaehee Song +4
cs.CL 2026-05-13 reviewed

Query self-training adapts LLMs using only input-derived pairs
Query-Conditioned Test-Time Self-Training for Large Language Models

Chaehee Song +4
cs.CL 2026-05-13 reviewed

Document MT then segment refinement beats full-document fixes
What Does LLM Refinement Actually Improve? A Systematic Study on Document-Level Literary Translation

Shaomu Tan +7
cs.CL 2026-05-13 reviewed

Shared preference vector controls LLM choices across personas
Probing Persona-Dependent Preferences in Language Models

Oscar Gilg +3
cs.CL 2026-05-13 reviewed

One vector steers LLM preferences across opposing personas
Probing Persona-Dependent Preferences in Language Models

Oscar Gilg +3
cs.CL 2026-05-13 reviewed

One LLM persuades another to ignore its own safety rules
LLM-Based Persuasion Enables Guardrail Override in Frontier LLMs

Rodrigo Nogueira +9
cs.CL 2026-05-13 reviewed

18,900 questions test financial reasoning in six Indic languages
FIND: Toward Multimodal Financial Reasoning and Question Answering for Indic Languages

Sarmistha Das +4
cs.CL 2026-05-13 reviewed

Persona vectors form in first 0.22% of LLM pretraining
Tracing Persona Vectors Through LLM Pretraining

Viktor Moskvoretskii +4
cs.RO 2026-05-13 reviewed

Stereo vision and location priors boost real-world robot navigation
What Limits Vision-and-Language Navigation ?

Yunheng Wang +11
cs.CL 2026-05-13 reviewed

Pooled preferences nearly match individual fine-tuning for personalization
PRISM-X: Experiments on Personalised Fine-Tuning with Human and Simulated Users

Hannah Rose Kirk +6
cs.AI 2026-05-13 reviewed

Simple recipe scales reasoning model to olympiad gold
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

Yafu Li +27
cs.CL 2026-05-13 reviewed

Contrastive rollouts assign credit to individual agents in LLM teams
CANTANTE: Optimizing Agentic Systems via Contrastive Credit Attribution

Tom Zehle
cs.CL 2026-05-13 reviewed

Parallel dataset gives medical dialogues in nine Indic languages
IndicMedDialog: A Parallel Multi-Turn Medical Dialogue Dataset for Accessible Healthcare in Indic Languages

Shubham Kumar Nigam +2
cs.CL 2026-05-13 reviewed

Latent info gain ranks visual evidence for better multimodal RAG
Utility-Oriented Visual Evidence Selection for Multimodal Retrieval-Augmented Generation

Weiqing Luo +5
cs.CL 2026-05-13 reviewed

Hybrid conversion lets LLMs query building models in plain English
A Hybrid Framework for Natural Language Querying of IFC Models with Relational and Graph Representations

Rabindra Lamsal +4
cs.CL 2026-05-13 reviewed

GAGPO computes temporal advantages from grouped rollouts without a critic
GAGPO: Generalized Advantage Grouped Policy Optimization

Siyuan Zhu +6
stat.ML 2026-05-13 reviewed

Entropy rises with missing context in LLMs
LLMs as Implicit Imputers: Uncertainty Should Scale with Missing Information

Stef van Buuren
cs.CL 2026-05-13 reviewed

Models frequently fail to build valid geometry diagrams from text
GeoBuildBench: A Benchmark for Interactive and Executable Geometry Construction from Natural Language

Jinwoong Kim +2
cs.CL 2026-05-13 reviewed

Pruning trims long reasoning by 19-42% with little accuracy loss
STOP: Structured On-Policy Pruning of Long-Form Reasoning in Low-Data Regimes

Chenjun Xu +5
cs.CL 2026-05-13 reviewed

Acquisition rewards yield 2-7% gains in student model training
AcquisitionSynthesis: Targeted Data Generation using Acquisition Functions

Ishika Agarwal +6
cs.SE 2026-05-13 reviewed

LLMs lag experts on system-level performance code
PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization

Huihao Jing +7
cs.CL 2026-05-13 reviewed

Teacher confidence gates improve reasoning in small models
GateKD: Confidence-Gated Closed-Loop Distillation for Robust Reasoning

Kasidit Sermsri +1
cs.CL 2026-05-13 reviewed

Knowledge base lifts Text-to-SQL accuracy when data is scarce
Knowledge Distillation for Low-Resource Open-source Text-to-SQL Model

Tianhao Qiu +1
cs.CL 2026-05-13 reviewed

Small 244M Whisper matches large models on Indic speech
Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition

Kush Juvekar +4
cs.CL 2026-05-13 reviewed

This paper applies a generative meta-learning algorithm to spoken word classification…
Does language matter for spoken word classification? A multilingual generative meta-learning approach

Batsirayi Mupamhi Ziki +2
cs.CL 2026-05-13 reviewed

Multilingual edge in word classification is smaller than expected
Does language matter for spoken word classification? A multilingual generative meta-learning approach

Batsirayi Mupamhi Ziki +2
cs.CL 2026-05-13 reviewed

LLM JSON stays valid inside tight token budgets
TruncProof: A Guardrail for LLM-based JSON Generation under Token-Length Constraints

Yoshio Kato +1
cs.CL 2026-05-13 reviewed

GeMCL classifies 1000 words from five shots each with stable accuracy
Scaling few-shot spoken word classification with generative meta-continual learning

Louise Beyers +2
cs.CL 2026-05-13 reviewed

GeMCL scales spoken word classification to 1000 classes with five shots each
Scaling few-shot spoken word classification with generative meta-continual learning

Louise Beyers +2
cs.SE 2026-05-13 reviewed

Deeper thought per algorithm beats more candidates under fixed tokens
Effective Harness Engineering for Algorithm Discovery with Coding Agents

Yoichi Ishibashi +2
cs.CL 2026-05-13 reviewed

GenAI flattens L2 writers' voices into uniform English
The Cost of Perfect English: Pragmatic Flattening and the Erasure of Authorial Voice in L2 Writing Supported by GenAI

Ao Liu +1
cs.IR 2026-05-13 reviewed

LLMs predict query-specific validity horizons for web content
RAG-Enhanced Large Language Models for Dynamic Content Expiration Prediction in Web Search

Tingyu Chen +6
cs.CL 2026-05-13 reviewed

Pruning candidate contexts with search tools improves LLM performance
Context Training with Active Information Seeking

Zeyu Huang +6
cs.CL 2026-05-13 reviewed

Pruning multiple search contexts lifts LLM adaptation gains
Context Training with Active Information Seeking

Zeyu Huang +6
cs.LG 2026-05-13 reviewed

LLMs miss when medical guidelines expire
Large Language Models Lack Temporal Awareness of Medical Knowledge

Zihan Guan +8

1 Piths
cs.CL 2026-05-13 reviewed

Jailbreak success in diffusion LMs drops to 0.64% via step-wise remasking
Adaptive Steering and Remasking for Safe Generation in Diffusion Language Models

Yejin Lee +1
cs.LG 2026-05-13 reviewed

Bell-shaped sampling trains masked diffusion models 4x faster
Understanding and Accelerating the Training of Masked Diffusion Language Models

Chunsan Hong +7
cs.CL 2026-05-13 reviewed

Multimodal reasoning lifts MI coding accuracy to 52 percent
Leveraging Multimodal Self-Consistency Reasoning in Coding Motivational Interviewing for Alcohol Use Reduction

Guangzeng Han +4
cs.CL 2026-05-13 reviewed

Multimodal voting lifts MI coding accuracy to 52.56%
Leveraging Multimodal Self-Consistency Reasoning in Coding Motivational Interviewing for Alcohol Use Reduction

Guangzeng Han +4
cs.CL 2026-05-13 reviewed

Speech marks when insights transfer across similar problems
Leveraging Speech to Identify Signatures of Insight and Transfer in Problem Solving

Linas Nasvytis +1
cs.CL 2026-05-13 reviewed

Speech reveals when insights transfer across problems
Leveraging Speech to Identify Signatures of Insight and Transfer in Problem Solving

Linas Nasvytis +1
cs.CL 2026-05-13 reviewed

Repeated insight type speeds solving and boosts problem categorization in speech
Leveraging Speech to Identify Signatures of Insight and Transfer in Problem Solving

Linas Nasvytis +1
cs.LG 2026-05-13 reviewed

LLM states project to F2 for 93% zero-shot ontology accuracy
Controlling Logical Collapse in LLMs via Algebraic Ontology Projection over F2

Hisashi Miyashita +1