archive
Every paper Pith has read. Search by title, abstract, or pith.
7661 papers in cs.CL · page 13
-
Block attention nears full performance via semantic blocks
Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation
-
Block attention matches full results via segmentation and distillation
Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation
-
Dataset links Russian speeches to images and translations
Linked Multi-Model Data on Russian Domestic and Foreign Policy Speeches
-
VLMs miss image swaps when claiming to recheck visuals
Are VLMs Seeing or Just Saying? Uncovering the Illusion of Visual Re-examination
1 Piths -
Brain voxels respond to specific image features identified by interpretability tools
Mechanistically Interpretable Neural Encoding Reveals Fine-Grained Functional Selectivity in Human Visual Cortex
-
Canvas turns linear LLM chats into branching trees
Conversations in Space: Structuring Non-Linear LLM Interactions on a Canvas
-
BootstrapAgent distills repo setup into reusable contracts
BootstrapAgent: Distilling Repository Setup into Reusable Agent Knowledge
-
Dataset shows MT systems lose PDF layout during translation
ForMaT: Dataset for Visually-Grounded Multilingual PDF Translation
-
Small open LLMs match big models in translation quality estimation
CompactQE: Interpretable Translation Quality Estimation via Small Open-Weight LLMs
-
DimMem hits 81% accuracy with 24% lower token cost
DimMem: Dimensional Structuring for Efficient Long-Term Agent Memory
-
Strategy nudging lifts RLVR performance beyond larger rollouts
Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR
-
Collaborative filtering assigns optimal contexts per LLM input
Contexting as Recommendation: Evolutionary Collaborative Filtering for Context Engineering
-
Benchmark shows agents fail at composing scattered multimodal evidence
SMMBench: A Benchmark for Source-Distributed Multimodal Agent Memory
-
Hybrid tree-graph evolves agent memory into summaries
H-Mem: A Novel Memory Mechanism for Evolving and Retrieving Agent Memory via a Hybrid Structure
-
Reshaping anchors lets LLMs sample more reasoning modes
SAGE: Shaping Anchors for Guided Exploration in RLVR of LLMs
-
Activation steering plus rewards improves unlearning and quality in MLLMs
ASRU: Activation Steering Meets Reinforcement Unlearning for Multimodal Large Language Models
-
Few-shot LLMs beat BioBERT on patient inquiry triage
Few-Shot Large Language Models for Actionable Triage Categorization of Online Patient Inquiries
-
Benchmark shows VLMs lag on code-based diagram tasks
VCG-Bench: Towards A Unified Visual-Centric Benchmark for Structured Generation and Editing
-
Dynamic chunking lifts diffusion LMs over positional blocks
Dynamic Chunking for Diffusion Language Models
-
LLMs miss ambiguity in Chinese sentences
Evaluating Chinese Ambiguity Understanding in Large Language Models
-
LLMs heavily favor English with no cost savings from continual pre-training
Toward LLMs Beyond English-Centric Development
-
Diffusion LLMs reach 5.5x tokens per forward pass
PSD: Pushing the Pareto Frontier of Diffusion LLMs via Parallel Speculative Decoding
-
LLMs master new code syntax but cannot apply it to solve problems
Syntax Without Semantics: Teaching Large Language Models to Code in an Unseen Language
-
Steering vectors accelerate optimization for rare behaviors
VSPO: Vector-Steered Policy Optimization for Behavioral Control
-
LLMs spot mental health entities but miss relations and reasoning
MHGraphBench: Knowledge Graph-Grounded Benchmarking of Mental Health Knowledge in Large Language Models
-
Semantic rewards improve LLM uncertainty calibration
Calibrating LLMs with Semantic-level Reward
-
Semantic reward cuts LLM calibration error by up to 40%
Calibrating LLMs with Semantic-level Reward
-
Learned policy decides when to add one sequential step after parallel agents
Response-Conditioned Parallel-to-Sequential Orchestration for Multi-Agent Systems
-
LLM activation peaks vary by 10,000x across model families
Measuring Maximum Activations in Open Large Language Models
-
Dependency graphs lift Transformer syntactic generalization
GiLT: Augmenting Transformer Language Models with Dependency Graphs
-
Latent geometry fails to ensure good token recovery
When Latent Geometry Is Not Enough: Draft-Conditioned Latent Refinement for Non-Autoregressive Text Generation
-
High-divergence prompts improve distillation by up to 15%
DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation
-
Divergence-guided prompts deliver 15% gains in VLM distillation
DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation
-
Reliability signal cuts token use by a third in reasoning
Process Rewards with Learned Reliability
-
Benchmark tests LLM detectors across 8 languages and real edits
DetectRL-X: Towards Reliable Multilingual and Real-World LLM-Generated Text Detection
-
DetectRL-X benchmark tests detectors across 8 languages and real AI writing
DetectRL-X: Towards Reliable Multilingual and Real-World LLM-Generated Text Detection
-
RoPE loses position and token distinction in long contexts
RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably
-
Draft model prunes 90% of attention in large LLMs
STS: Efficient Sparse Attention with Speculative Token Sparsity
-
New benchmark suite tests LLMs on finance difficulty levels
FINESSE-Bench: A Hierarchical Benchmark Suite for Financial Domain Knowledge and Technical Analysis in Large Language Models
-
Benchmark suite tests LLMs across eight financial expertise levels
FINESSE-Bench: A Hierarchical Benchmark Suite for Financial Domain Knowledge and Technical Analysis in Large Language Models
-
RAG pipeline reaches 80% F1 on clinical transcript extraction
Retrieval-Augmented Large Language Models for Schema-Constrained Clinical Information Extraction
-
Open-ended RL boosts LLM reasoning with 46x less data
GRLO: Towards Generalizable Reinforcement Learning in Open-Ended Environments from Zero
-
Reasoning models take different paths
Reasoning Models Don't Just Think Longer, They Move Differently
-
Fewer parses increase model surprise in garden paths but not enough
Why are language models less surprised than humans? Testing the Parse Multiplicity Mismatch Hypothesis
-
Math tasks produce highest attention entropy in LLMs
Neural Activation Patterns Across Language Model Architectures: A Comprehensive Analysis of Cognitive Task Performance
-
Reinforcement updates replace feedback loops in LLM alpha discovery
From Feedback Loops to Policy Updates: Reinforcement Fine-Tuning for LLM-Based Alpha Factor Discovery
-
LLM help adapts to user expertise domains to limit over-reliance
Capability Conditioned Scaffolding for Professional Human LLM Collaboration
-
Ghana AI legal tool handles 32,000 student queries in 30 months
Eskwai for Students: Generative AI Assistant for Legal Education in Ghana
-
WhatsApp AI bot offers science help to West African students
Adesua: Development and Feasibility Study of an AI WhatsApp Bot for Science Learning in West Africa
-
Humans choose words step by step under tight vocabulary limits
Greedy or not, here I come: Language production under vocabulary constraints in humans and resource-rational models