archive
Every paper Pith has read. Search by title, abstract, or pith.
7661 papers in cs.CL · page 5
-
Post-editors change one in three literary MT metaphors
Metaphors in Literary Post-Editing: Opening Pandora's Box?
-
ChunkFT fits full fine-tuning of 8B models in 14GB GPU memory
ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning
-
Fine-tuned LLM reaches 0.866 F1 on Spanish psychiatric ICD coding
Automated ICD Classification of Psychiatric Diagnoses: From Classical NLP to Large Language Models
-
SMoA outperforms LoRA in low-budget fine-tuning
SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning
-
Error highlights and suggestions bring no post-editing speed or quality gains
Smarter edits? Post-editing with error highlights and translation suggestions
-
Small classifier beats LLMs at pulling exact text from papers
ACL-Verbatim: hallucination-free question answering for research
-
Extractors score 0.93 on articles but only 0.41-0.84 on other pages
WCXB: A Multi-Type Web Content Extraction Benchmark
-
LLMs unstable on Korean honorifics for car assistants
LoCar: Localization-Aware Evaluation of In-Vehicle Assistants through Fine-Grained Sociolinguistic Control
-
LLMs reach 0.91 agreement grading public law exams
GradeLegal: Automated Grading for German Legal Cases
-
ClaimRAG-LAW benchmark separates retrieval and generation errors in legal RAG
Fine-grained Claim-level RAG Benchmark for Law
-
New benchmark separates retrieval from generation errors in legal RAG
Fine-grained Claim-level RAG Benchmark for Law
-
New dataset separates retrieval from generation in legal RAG
Fine-grained Claim-level RAG Benchmark for Law
-
Routing leads in adapting LLMs to hidden style preferences
APM: Evaluating Style Personalization in LLMs with Arbitrary Preference Mappings
-
LLM-brain alignment stable across languages but not from surprise or compression
Cross-lingual robustness of LLM-brain alignment and its computational roots
-
Less data yields clearer AI skills taxonomies
Building a Custom Taxonomy of AI Skills and Tasks from the Ground Up with Job Postings
-
Agent turns natural language into governed enterprise API calls
Beyond Text-to-SQL: An Agentic LLM System for Governed Enterprise Analytics APIs
-
Off-the-shelf persona vectors rival targeted sycophancy steering
Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy
-
DABS cuts multi-aspect sentiment computation by up to 60%
Single-Pass, Depth-Selective Reading for Multi-Aspect Sentiment Analysis
-
Anchor regularization makes LLM safety consistent across prompt variations
Towards Context-Invariant Safety Alignment for Large Language Models
-
Arabic memes dataset finds Islamist and satirical ones most hostile
ArPoMeme: An Annotated Arabic Multimodal Dataset for Political Ideology and Polarization
-
Arabic job ads corpus shows gendered hiring patterns persist
JobArabi: An Arabic Corpus and Analysis of Job Announcements from Social Media
-
Grafted hidden states raise language model scores over MoE and Engram
Memory Grafting: Scaling Language Model Pre-training via Offline Conditional Memory
-
Interleaved reasoning boosts speech AI math scores by 13%
Thinking-while-speaking: A Controlled, Interleaved Reasoning Method for Real-Time Speech Generation
-
DASH discovers strong hybrid attention for LLMs in 20 minutes on one GPU
DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU
-
Strategy induction from questions alone improves LLM task instructions
Strategy-Induct: Task-Level Strategy Induction for Instruction Generation
-
Phoneme recognition proxies articulatory synthesis quality
Evaluating Speech Articulation Synthesis with Articulatory Phoneme Recognition
-
Task-routed experts lift implicit sentiment scores
Task-Routed Mixture-of-Experts with Cognitive Appraisal for Implicit Sentiment Analysis
-
Unlearned models keep low calibration but lean on shortcuts
Calibration vs Decision Making: Revisiting the Reliability Paradox in Unlearned Language Models
-
Corpora fine-tune machine translation for science
Enhancing Scientific Discourse: Machine Translation for the Scientific Domain
-
Skill synthesis scales terminal-agent data to beat baselines with 1% of it
Terminal-World: Scaling Terminal-Agent Environments via Agent Skills
-
Multi-metric score spots synthetic narratives more reliably
Detecting Synthetic Political Narratives in Cross-Platform Social Media Discourse
-
MemGym isolates memory from reasoning in agent benchmarks
MemGym: a Long-Horizon Memory Environment for LLM Agents
-
7B open LLMs run GraphRAG locally for EHR schema queries
GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval
-
Column-sparse attention nearly doubles diffusion LLM speed
PulseCol: Periodically Refreshed Column-Sparse Attention for Accelerating Diffusion Language Models
-
Refined guidelines help LLMs match biomedical expert annotations
Refining and Reusing Annotation Guidelines for LLM Annotation
-
Only two of 20 Transformer modifications transfer at 1-3B
Most Transformer Modifications Still Do Not Transfer at 1-3B: A 2020-2026 Update to Narang et al. (2021) with Downstream Evaluation and a Noise Floor
-
Guidelines for text data raise consistency in climate impact datasets
Assessing socio-economic climate impacts from text data
-
Social barriers outweigh linguistic ones in Arabic NLP
Building Arabic NLP from the Ground Up: Twenty Years of Lessons, Failures, and Open Problems
-
LLM interventions create user drift that biases simulated experiments
The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study
-
Detectors separate human and AI text well but lag on naming the model
Findings of the Counter Turing Test: AI-Generated Text Detection
-
Hidden states at paragraph boundaries tune verifier strictness
The Hidden Signal of Verifier Strictness: Controlling and Improving Step-Wise Verification via Selective Latent Steering
-
Constraint engine turns AI drawings into verifiable geometry reasoning
Draw2Think: Harnessing Geometry Reasoning through Constraint Engine Interaction
-
RL scores full distributions to fix LLM regression
Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression
-
Aligning task vectors to in-context next-token distributions lifts accuracy 9.2%
Distributional Alignment as a Criterion for Designing Task Vectors in In-Context Learning
-
Framework synthesizes realistic conversational retrieval benchmarks
MTR-Suite: A Framework for Evaluating and Synthesizing Conversational Retrieval Benchmarks
-
Categorical error rates beat WER for Indic speech recognition
SCRIBE: Diagnostic Evaluation and Rich Transcription Models for Indic ASR
-
Agreement screening yields clearer text features at full accuracy
Interpretable Discriminative Text Representations via Agreement and Label Disentanglement
-
Self-limiting losses compress embeddings without overfitting
DIVE: Embedding Compression via Self-Limiting Gradient Updates
-
Utility ranking trims credit document review to three minutes
Beyond Semantic Similarity: A Two-Phase Non-Parametric Retrieval Workflow for Corporate Credit Underwriting
-
AI reviewer beats top human on Nature papers
On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists