archive
Every paper Pith has read. Search by title, abstract, or pith.
7661 papers in cs.CL · page 15
-
Geometry scores pick shallow layers for diffusion insertion in transformers
Where Should Diffusion Enter a Language Model? Geometry-Guided Hidden-State Replacement
-
Semantic RL adds low-resource languages without erasing prior skills
Reinforcement Learning with Semantic Rewards Enables Low-Resource Language Expansion without Alignment Tax
-
Short concern texts track with activity drops and sleep issues
A Formative Study of Brief Affective Text as a Complement to Wearable Sensing for Longitudinal Student Health Monitoring
-
LLM filter and clustering finds 41 manipulative narrative clusters
LLM-based Detection of Manipulative Political Narratives
-
Prompt filter and clustering finds 41 narrative clusters
LLM-based Detection of Manipulative Political Narratives
-
Transformers score German texts on left-right scale
Ideology Prediction of German Political Texts
-
Dynamic Latent Routing beats supervised fine-tuning by 6.6 points
Dynamic Latent Routing
-
Exact prefix factorization removes errors in diffusion language models
Factorization-Error-Free Discrete Diffusion Language Model via Speculative Decoding
-
Minimal KV scorer tweak beats complex cache redesigns
Minimal-Intervention KV Retention via Set-Conditioned Diversity
-
Simple diversity penalty in KV scorer beats complex designs
Minimal-Intervention KV Retention via Set-Conditioned Diversity
-
Hidden noise stops vision-language models learning real content
To See is Not to Learn: Protecting Multimodal Data from Unauthorized Fine-Tuning of Large Vision-Language Model
-
Web agents should plan before seeing page content
Web Agents Should Adopt the Plan-Then-Execute Paradigm
-
MetaMoE combines independently trained expert models into one Mixture-of-Experts system…
MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification
-
Agent harnesses break rules mid-task despite safe final answers
Auditing Agent Harness Safety
-
Agent harnesses allow unsafe actions even with correct final outputs
Auditing Agent Harness Safety
-
Hypergraph reasoner hits 94.7% on supply chain RCA
Hypergraph Enterprise Agentic Reasoner over Heterogeneous Business Systems
-
Spelling and test design confound KVL word difficulty ratings
Sakura at BEA 2026 Shared Task 1: What Makes Vocabulary Difficult?
-
LLM rates vocab difficulty at r > 0.91
Sakura at BEA 2026 Shared Task 1: What Makes Vocabulary Difficult?
-
Active learners raise NDCG@10 per call in PRP reranking
Active Learners as Efficient PRP Rerankers
-
Active rankers lift NDCG@10 per call in PRP reranking
Active Learners as Efficient PRP Rerankers
-
Transformer predicts next disease with 0.871 median AUC across 896 categories
DT-Transformer: A Foundation Model for Disease Trajectory Prediction on a Real-world Health System
-
Small mismatches in LLM RL rollout and optimization cause collapse
Diagnosing Training Inference Mismatch in LLM Reinforcement Learning
-
Prefill-only adapters deliver 1.9x throughput for 512 users
PreFT: Prefill-only finetuning for efficient inference
-
Filter drops harmful examples to hold LLM attack rate below 6 percent
GradShield: Alignment Preserving Finetuning
-
RAG succeeds when evidence flows deeper and more distributed
Why Retrieval-Augmented Generation Fails: A Graph Perspective
-
Engineered texts recover exact backbones on 100-atom quantum processor
QOuLiPo: What a quantum computer sees when it reads a book
-
Imagined future steps triple recall of distant memories
Thinking Ahead: Prospection-Guided Retrieval of Memory with Language Models
-
Search-based bookmarks beat summarization for role-play memory
BOOKMARKS: Efficient Active Storyline Memory for Role-playing
-
Safety refusals rise with Korean language but drop with Korean context
ROK-FORTRESS: Measuring the Effect of Geopolitical Transcreation for National Security and Public Safety
-
Distance and direction encode relations in LLM embeddings
Polar probe linearly decodes semantic structures from LLMs
-
LLMs encode relations as distances and directions in embeddings
Polar probe linearly decodes semantic structures from LLMs
-
Routed small models add value to AlphaEarth on hydrology questions
Mini-JEPA Foundation Model Fleet Enables Agentic Hydrologic Intelligence
-
LLM with verifiable RL rewards meets room sizes and connections
Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards
-
Reversing conflicting document order flips 11-25% of LLM answers
When Evidence Conflicts: Uncertainty and Order Effects in Retrieval-Augmented Biomedical Question Answering
-
Conformal method bounds confident errors in CoT reasoning
Pause and Reflect: Conformal Aggregation for Chain-of-Thought Reasoning
-
DExperts blocks explicit toxicity but slips on implicit hate speech
Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study
-
DExperts hits 100% safety on explicit toxicity but drops on implicit
Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study
-
Constrained edits merge checkpoints to lift code agent scores
CRANE: Constrained Reasoning Injection for Code Agents via Nullspace Editing
-
Cosine similarity misleads on which layers matter in LLMs
Rethinking Layer Relevance in Large Language Models Beyond Cosine Similarity
-
Adaptive weights fix distribution drift in LLM reasoning distillation
Distribution Corrected Offline Data Distillation for Large Language Models
-
Benchmark standardizes early Parkinson's speech detection
A Benchmark for Early-stage Parkinson's Disease Detection from Speech
2 Piths -
Early rejection cuts LLM synthetic data tokens by 11-77%
Know When To Fold 'Em: Token-Efficient LLM Synthetic Data Generation via Multi-Stage In-Flight Rejection
-
Dual RL agents learn to probe like Supreme Court justices
Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents
-
Dual RL agents learn to probe like Supreme Court justices
Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents
-
New method lifts multi-task LLM accuracy by 6.67 percent
PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts
-
Logic rules in prompts cut RAG errors via derivation trees
Derivation Prompting: A Logic-Based Method for Improving Retrieval-Augmented Generation
-
Formal checks can keep AI legal reasoning inside the text
Bridging Legal Interpretation and Formal Logic: Faithfulness, Assumption, and the Future of AI Legal Reasoning
-
Audited data lifts 8B model 18 points on physics olympiads
Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning
-
Learned predictor prunes KV cache 3-10x on the fly
Self-Pruned Key-Value Attention: Learning When to Write by Predicting Future Utility
-
Unary recoding enables polynomial-time rule learning for LLMs
Enhanced and Efficient Reasoning in Large Learning Models