archive
Every paper Pith has read. Search by title, abstract, or pith.
7661 papers in cs.CL · page 19
-
Re-testing lowers most controlled text generation scores
A Comparative Study of Controlled Text Generation Systems Using Level-Playing-Field Evaluation Principles
-
Small detector beats large models at spotting LLM hallucinations
Scalable Token-Level Hallucination Detection in Large Language Models
-
Pretraining exposure predicts LLM popularity better than Wikipedia
Pretraining Exposure Explains Popularity Judgments in Large Language Models
-
High-convergence sentences lift LLM accuracy on inferential questions
Context Convergence Improves Answering Inferential Questions
-
Benchmark forces models to combine facts from two articles
MedHopQA: A Disease-Centered Multi-Hop Reasoning Benchmark and Evaluation Framework for LLM-Based Biomedical Question Answering
-
Summing PEFT module outputs boosts multi-attribute text control
Output Composability of QLoRA PEFT Modules for Plug-and-Play Attribute-Controlled Text Generation
-
Index ranks category pairs by confusion risk in data entry
A categorical error sensitivity index (ISEC): A preventive ordinal decision-support measure for irrecoverable errors in manual data entry systems
-
Retrieval lifts two-hop medical QA to 89% conceptual accuracy
Overview of the MedHopQA track at BioCreative IX: track description, participation and evaluation of systems for multi-hop medical question answering
-
Gender bias and facts share the same neurons in language models
GKnow: Measuring the Entanglement of Gender Bias and Factual Gender
-
Token-level ratio matching generalizes DPO for precise alignment
TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching
-
Token-level ratio matching aligns models at each generation step
TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching
-
Familiarity dominates English word difficulty across three L1 groups
What makes a word hard to learn? Modeling L1 influence on English vocabulary difficulty
-
New decoder recovers personal data from finetuned models
Reconstruction of Personally Identifiable Information from Supervised Finetuned Models
-
PRISM cuts context use by 10x while lifting accuracy on long agent tasks
PRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agents
-
PRISM hits higher accuracy with 10x less context in long-horizon agent memory
PRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agents
-
Benchmark finds LLMs miss how scams escalate turn by turn
PreScam: A Benchmark for Predicting Scam Progression from Early Conversations
-
Token marks plus contrastive tuning clean disfluent speech transcripts
Mind the Pause: Disfluency-Aware Objective Tuning for Multilingual Speech Correction with LLMs
-
Combined optimization and distillation boosts long-context LLM reasoning
Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models
-
Sparse autoencoders expose features inside Whisper ASR
Mechanistic Interpretability of ASR models using Sparse Autoencoders
-
LoRA accuracy depends on which parameters are trained
Not How Many, But Which: Parameter Placement in Low-Rank Adaptation
-
LLM decoding routes around memory clashes via attention checks
Mitigating Context-Memory Conflicts in LLMs through Dynamic Cognitive Reconciliation Decoding
-
Discovery Agents Beat Learned Models Under Enterprise Shifts
Do Enterprise Systems Need Learned World Models? The Importance of Context to Infer Dynamics
-
Bayesian priors fix up to 50-point errors in LLM user feedback
Correcting Selection Bias in Sparse User Feedback for Large Language Model Quality Estimation: A Multi-Agent Hierarchical Bayesian Approach
-
Reconstructing missing facts boosts misinformation detection
Latent Causal Void: Explicit Missing-Context Reconstruction for Misinformation Detection
-
One autoregressive model makes personalized ad images and text
Design Your Ad: Personalized Advertising Image and Text Generation with Unified Autoregressive Models
-
Poetic prompts create separate processing paths that evade LLM safety
Metaphor Is Not All Attention Needs
-
Data focus and signer adaptation unlock low-resource sign language AI
Sign Language Recognition and Translation for Low-Resource Languages: Challenges and Pathways Forward
-
World models merge with action generation for embodied AI
World Action Models: The Next Frontier in Embodied AI
-
LLMs show limited evidence of grammar violation detectors
Do Language Models Encode Knowledge of Linguistic Constraint Violations?
-
LLMs show limited internal grammar violation detectors
Do Language Models Encode Knowledge of Linguistic Constraint Violations?
-
Spoken input aids verb learning over child-directed speech
Is Child-Directed Language Optimized for Word Learning? A Computational Study of Verb Meaning Acquisition
-
Skill graphs boost agent RL on complex tasks
SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs
-
Three-stage retrieval pipeline ranks 8th in SemEval multi-turn task
Caraman at SemEval-2026 Task 8: Three-Stage Multi-Turn Retrieval with Query Rewriting, Hybrid Search, and Cross-Encoder Reranking
-
SAGE proposes a framework that trains smaller models to automatically generate and verify…
SAGE: Scalable Automated Robustness Augmentation for LLM Knowledge Evaluation
-
Benchmark finds skills expose agents to unsafe attacks
SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces
-
Human actions guide LLM agents past RL barriers
Learning Agentic Policy from Action Guidance
-
Selective visuals raise Indic subtitle translation scores
Towards Visually-Guided Movie Subtitle Translation for Indic Languages
-
Rubric test predicts LLM post-training success at 90% accuracy
On Predicting the Post-training Potential of Pre-trained LLMs
-
Scenario modeling plus intent bridging lifts target-guided dialogues
Enhancing Target-Guided Proactive Dialogue Systems via Conversational Scenario Modeling and Intent-Keyword Bridging
-
Frozen CLIP features top ResNet for instructional video summaries
Multimodal Abstractive Summarization of Instructional Videos with Vision-Language Models
-
Print statements teach code models to reason step by step
StepCodeReasoner: Aligning Code Reasoning with Stepwise Execution Traces via Reinforcement Learning
-
Neuron activation margins augment preference optimization for math
YFPO: A Preliminary Study of Yoked Feature Preference Optimization with Neuron-Guided Rewards for Mathematical Reasoning
-
Sparse autoencoders become steering and optimization tools for LLMs
Qwen-Scope: Turning Sparse Features into Development Tools for Large Language Models
-
Concordance tool assembles local grammars for better name extraction
Concordance Comparison as a Means of Assembling Local Grammars
-
Unified visual latents cut reasoning tokens in multimodal models
UniVLR: Unifying Text and Vision in Visual Latent Reasoning for Multimodal LLMs
-
Boltzmann ranking on trajectories lifts diffusion language model performance
Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models
-
Boltzmann ranking of inference trajectories improves DLM post-training
Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models
-
Divergence signals adapt credit assignment for LLM agent RL
GEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillation
-
Divergence spikes adapt credit assignment for LLM agents
GEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillation
-
Fine-tuning teaches models to control randomness
Probabilistic Calibration Is a Trainable Capability in Language Models