archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 5

cs.CL 2026-05-20 reviewed

Post-editors change one in three literary MT metaphors
Metaphors in Literary Post-Editing: Opening Pandora's Box?

Aletta G. Dorst +2
cs.LG 2026-05-20 reviewed

ChunkFT fits full fine-tuning of 8B models in 14GB GPU memory
ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning

Yongkang Liu +9
cs.CL 2026-05-20 reviewed

Fine-tuned LLM reaches 0.866 F1 on Spanish psychiatric ICD coding
Automated ICD Classification of Psychiatric Diagnoses: From Classical NLP to Large Language Models

Fernando Ortega +5
cs.LG 2026-05-20 reviewed

SMoA outperforms LoRA in low-budget fine-tuning
SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning

Yongkang Liu +9
cs.CL 2026-05-20 reviewed

Error highlights and suggestions bring no post-editing speed or quality gains
Smarter edits? Post-editing with error highlights and translation suggestions

Fleur V.J. van Tellingen +6
cs.CL 2026-05-20 reviewed

Small classifier beats LLMs at pulling exact text from papers
ACL-Verbatim: hallucination-free question answering for research

G\'abor Recski +4
cs.CL 2026-05-20 reviewed

Extractors score 0.93 on articles but only 0.41-0.84 on other pages
WCXB: A Multi-Type Web Content Extraction Benchmark

Murrough Foley
cs.CL 2026-05-20 reviewed

LLMs unstable on Korean honorifics for car assistants
LoCar: Localization-Aware Evaluation of In-Vehicle Assistants through Fine-Grained Sociolinguistic Control

Seogyeong Jeong +6
cs.CL 2026-05-20 reviewed

LLMs reach 0.91 agreement grading public law exams
GradeLegal: Automated Grading for German Legal Cases

Abdullah Al Zubaer +4
cs.CL 2026-05-20 reviewed

ClaimRAG-LAW benchmark separates retrieval and generation errors in legal RAG
Fine-grained Claim-level RAG Benchmark for Law

Souvick Das +2
cs.CL 2026-05-20 reviewed

New benchmark separates retrieval from generation errors in legal RAG
Fine-grained Claim-level RAG Benchmark for Law

Souvick Das +2
cs.CL 2026-05-20 reviewed

New dataset separates retrieval from generation in legal RAG
Fine-grained Claim-level RAG Benchmark for Law

Souvick Das +2
cs.CL 2026-05-20 reviewed

Routing leads in adapting LLMs to hidden style preferences
APM: Evaluating Style Personalization in LLMs with Arbitrary Preference Mappings

Philipp Spohn +2
cs.CL 2026-05-20 reviewed

LLM-brain alignment stable across languages but not from surprise or compression
Cross-lingual robustness of LLM-brain alignment and its computational roots

Ni Yang +5
cs.CL 2026-05-20 reviewed

Less data yields clearer AI skills taxonomies
Building a Custom Taxonomy of AI Skills and Tasks from the Ground Up with Job Postings

Stephen Meisenbacher +1
cs.CL 2026-05-20 reviewed

Agent turns natural language into governed enterprise API calls
Beyond Text-to-SQL: An Agentic LLM System for Governed Enterprise Analytics APIs

Gundeep Singh +7
cs.AI 2026-05-20 reviewed

Off-the-shelf persona vectors rival targeted sycophancy steering
Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy

Ishaan Kelkar +5
cs.CL 2026-05-20 reviewed

DABS cuts multi-aspect sentiment computation by up to 60%
Single-Pass, Depth-Selective Reading for Multi-Aspect Sentiment Analysis

Yan Xia +3
cs.CL 2026-05-20 reviewed

Anchor regularization makes LLM safety consistent across prompt variations
Towards Context-Invariant Safety Alignment for Large Language Models

Yixu Wang +6
cs.CL 2026-05-20 reviewed

Arabic memes dataset finds Islamist and satirical ones most hostile
ArPoMeme: An Annotated Arabic Multimodal Dataset for Political Ideology and Polarization

Wajdi Zaghouani +3
cs.CL 2026-05-20 reviewed

Arabic job ads corpus shows gendered hiring patterns persist
JobArabi: An Arabic Corpus and Analysis of Job Announcements from Social Media

Wajdi Zaghouani +3
cs.CL 2026-05-20 reviewed

Grafted hidden states raise language model scores over MoE and Engram
Memory Grafting: Scaling Language Model Pre-training via Offline Conditional Memory

Runxi Cheng +9
cs.CL 2026-05-20 reviewed

Interleaved reasoning boosts speech AI math scores by 13%
Thinking-while-speaking: A Controlled, Interleaved Reasoning Method for Real-Time Speech Generation

Xuan Du +6
cs.LG 2026-05-20 reviewed

DASH discovers strong hybrid attention for LLMs in 20 minutes on one GPU
DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU

Weizhe Chen +5
cs.CL 2026-05-20 reviewed

Strategy induction from questions alone improves LLM task instructions
Strategy-Induct: Task-Level Strategy Induction for Instruction Generation

Po-Chun Chen +2
cs.CL 2026-05-20 reviewed

Phoneme recognition proxies articulatory synthesis quality
Evaluating Speech Articulation Synthesis with Articulatory Phoneme Recognition

Vinicius Ribeiro +1
cs.CL 2026-05-20 reviewed

Task-routed experts lift implicit sentiment scores
Task-Routed Mixture-of-Experts with Cognitive Appraisal for Implicit Sentiment Analysis

Yaping Chai +2
cs.CL 2026-05-20 reviewed

Unlearned models keep low calibration but lean on shortcuts
Calibration vs Decision Making: Revisiting the Reliability Paradox in Unlearned Language Models

Divyaksh Shukla +1
cs.CL 2026-05-20 reviewed

Corpora fine-tune machine translation for science
Enhancing Scientific Discourse: Machine Translation for the Scientific Domain

Dimitris Roussis +2
cs.CL 2026-05-20 reviewed

Skill synthesis scales terminal-agent data to beat baselines with 1% of it
Terminal-World: Scaling Terminal-Agent Environments via Agent Skills

Zihao Cheng +8
cs.SI 2026-05-20 reviewed

Multi-metric score spots synthetic narratives more reliably
Detecting Synthetic Political Narratives in Cross-Platform Social Media Discourse

Despoina Antonakaki +1
cs.CL 2026-05-20 reviewed

MemGym isolates memory from reasoning in agent benchmarks
MemGym: a Long-Horizon Memory Environment for LLM Agents

Wujiang Xu +10
cs.CL 2026-05-20 reviewed

7B open LLMs run GraphRAG locally for EHR schema queries
GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval

Peter Fernandes +1
cs.CL 2026-05-20 reviewed

Column-sparse attention nearly doubles diffusion LLM speed
PulseCol: Periodically Refreshed Column-Sparse Attention for Accelerating Diffusion Language Models

Yanyi Lyu +5
cs.CL 2026-05-20 reviewed

Refined guidelines help LLMs match biomedical expert annotations
Refining and Reusing Annotation Guidelines for LLM Annotation

Kon Woo Kim +2
cs.LG 2026-05-20 reviewed

Only two of 20 Transformer modifications transfer at 1-3B
Most Transformer Modifications Still Do Not Transfer at 1-3B: A 2020-2026 Update to Narang et al. (2021) with Downstream Evaluation and a Noise Floor

Yang Zhao +4
cs.CL 2026-05-20 reviewed

Guidelines for text data raise consistency in climate impact datasets
Assessing socio-economic climate impacts from text data

Mariana Madruga de Brito +17
cs.CL 2026-05-20 reviewed

Social barriers outweigh linguistic ones in Arabic NLP
Building Arabic NLP from the Ground Up: Twenty Years of Lessons, Failures, and Open Problems

Wajdi Zaghouani
cs.CL 2026-05-20 reviewed

LLM interventions create user drift that biases simulated experiments
The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study

Victoria Lin +5
cs.CL 2026-05-20 reviewed

Detectors separate human and AI text well but lag on naming the model
Findings of the Counter Turing Test: AI-Generated Text Detection

Rajarshi Roy +18
cs.LG 2026-05-20 reviewed

Hidden states at paragraph boundaries tune verifier strictness
The Hidden Signal of Verifier Strictness: Controlling and Improving Step-Wise Verification via Selective Latent Steering

Yefan Zhou +5
cs.CV 2026-05-20 reviewed

Constraint engine turns AI drawings into verifiable geometry reasoning
Draw2Think: Harnessing Geometry Reasoning through Constraint Engine Interaction

Juncheng Hu +3
cs.LG 2026-05-20 reviewed

RL scores full distributions to fix LLM regression
Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression

Jungsoo Park +6
cs.CL 2026-05-20 reviewed

Aligning task vectors to in-context next-token distributions lifts accuracy 9.2%
Distributional Alignment as a Criterion for Designing Task Vectors in In-Context Learning

Jihoon Kwon +2
cs.CL 2026-05-20 reviewed

Framework synthesizes realistic conversational retrieval benchmarks
MTR-Suite: A Framework for Evaluating and Synthesizing Conversational Retrieval Benchmarks

Junhao Ruan +10
cs.CL 2026-05-20 reviewed

Categorical error rates beat WER for Indic speech recognition
SCRIBE: Diagnostic Evaluation and Rich Transcription Models for Indic ASR

Kavya Manohar +3
cs.CL 2026-05-20 reviewed

Agreement screening yields clearer text features at full accuracy
Interpretable Discriminative Text Representations via Agreement and Label Disentanglement

Tong Wang +2
cs.CL 2026-05-20 reviewed

Self-limiting losses compress embeddings without overfitting
DIVE: Embedding Compression via Self-Limiting Gradient Updates

Dongfang Zhao
cs.CL 2026-05-20 reviewed

Utility ranking trims credit document review to three minutes
Beyond Semantic Similarity: A Two-Phase Non-Parametric Retrieval Workflow for Corporate Credit Underwriting

Linus Ng Junjia +4
cs.CL 2026-05-20 reviewed

AI reviewer beats top human on Nature papers
On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

Seungone Kim +57