archive
Every paper Pith has read. Search by title, abstract, or pith.
7661 papers in cs.CL · page 16
-
GraphRAG retrieval aligns LLM agents with social values
From Descriptive to Prescriptive: Uncover the Social Value Alignment of LLM-based Agents
-
Spherical KV stores keys as radius and angle codes to cut cache traffic
SPHERICAL KV: Angle-Domain Attention and Rate-Distortion Retention for Efficient Long-Context Inference
-
Attack collapses speculative decoding speedup by cutting token acceptance
Mistletoe: Stealthy Acceleration-Collapse Attacks on Speculative Decoding
-
Stealth attack collapses speculative decoding speedup
Mistletoe: Stealthy Acceleration-Collapse Attacks on Speculative Decoding
-
HodgeCover compresses MoE experts by covering harmonic cycles
HodgeCover: Higher-Order Topological Coverage Drives Compression of Sparse Mixture-of-Experts
2 Piths -
42M Spanish cyber model reaches 0.78 conversation score
VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use
-
Rebalanced training gives 42M Spanish cyber model tool-use ability
VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use
-
Rebalanced tool-use data lifts 42M Spanish cyber model to 0.23 accuracy
VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use
-
Six hours of data let a two-stage model beat larger ones on Wardaman
WARDEN: Endangered Indigenous Language Transcription and Translation with 6 Hours of Training Data
-
No voice agent tops 0.5 on both accuracy and experience
EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents
2 Piths -
Agent weight updates cut token use 83% while raising accuracy
Good Agentic Friends Do Not Just Give Verbal Advice: They Can Update Your Weights
-
Finetuning makes models believe claims labeled false
Negation Neglect: When models fail to learn negations in training
2 Piths -
LLM pipeline turns text into argument graphs
An LLM-Based System for Argument Mining
-
LLM pipeline builds argument graphs from plain text
An LLM-Based System for Argument Mining
-
Hidden-state transport geometry locates first LLM reasoning error
Where Does Reasoning Break? Step-Level Hallucination Detection via Hidden-State Transport Geometry
-
MoE beats dense on active params but loses on total capacity
Dense vs Sparse Pretraining at Tiny Scale: Active-Parameter vs Total-Parameter Matching
-
Trajectory balance stops diffusion models locking onto few paths
Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models
-
Models detect sensory-text mismatches inside but ignore them in answers
Senses Wide Shut: A Representation-Action Gap in Omnimodal LLMs
-
Fine-tuned 8B LLMs beat larger models on children's story difficulty
Children's English Reading Story Generation via Supervised Fine-Tuning of Compact LLMs with Controllable Difficulty and Safety
-
RTLC prompting boosts LLM judge accuracy by 14 points
RTLC -- Research, Teach-to-Learn, Critique: A three-stage prompting paradigm inspired by the Feynman Learning Technique that lifts LLM-as-judge accuracy on JudgeBench with no fine-tuning
-
Stage-wise DPO reduces hallucinations in vision-language models
Reducing Hallucination in Vision-Language Models via Stage-wise Preference Optimization under Distribution Shift
-
Fine-tuning plus hierarchical prompts strengthen propaganda detection
Fine-tuning with Hierarchical Prompting for Robust Propaganda Classification Across Annotation Schemas
-
Low-rank training reaches distinct loss basins from full rank
Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training
-
Low-rank pre-training lands in different loss basins than full-rank
Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training
-
Compiler produces reusable configs for LLM workflows at 6.4x speedup
FlowCompile: An Optimizing Compiler for Structured LLM Workflows
-
Truncating supervision at feedback collapse beats full OPD
Prefix Teach, Suffix Fade: Local Teachability Collapse in Strong-to-Weak On-Policy Distillation
-
RDPO normalizes and whitens rewards to stabilize RL advantages
Multi-Objective and Mixed-Reward Reinforcement Learning via Reward-Decorrelated Policy Optimization
-
Edit-level vote reduces over-correction in LLM grammar fixes
Edit-level Majority Voting Mitigates Over-Correction in LLM-based Grammatical Error Correction
-
LLM judges favor machine translations over creative literary ones
Creativity Bias: How Machine Evaluation Struggles with Creativity in Literary Translations
-
Artificial uncertainty on easy data improves real uncertainty probes
Inducing Artificial Uncertainty in Language Models
-
OCR training method improves text reading in blurry and cluttered images
Multilingual OCR-Aware Fine-Tuning and Prompt-Guided Chain-of-Thought Reasoning for Multimodal Large Language Models
-
LLMs show recall-safety tradeoffs on real ICU data
RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation
-
Locale prompts eliminate SLM copying in on-device PII replacement
Locale-Conditioned Few-Shot Prompting Mitigates Demonstration Regurgitation in On-Device PII Substitution with Small Language Models
-
Temperature adjustment turns reward models into a calibrated SLOP
Temper and Tilt Lead to SLOP: Reward Hacking Mitigation with Inference-Time Alignment
-
Students rate AI slides equal to instructor ones
AI-Generated Slides: Are They Good? Can Students Tell?
-
Ordered demos turn many-shot CoT into test-time learning
Many-Shot CoT-ICL: Making In-Context Learning Truly Learn
-
Shared covariance summation leads multilingual editing results
Merging Methods for Multilingual Knowledge Editing for Large Language Models: An Empirical Odyssey
-
Reflective experiences guide LLM agents to better memory searches
R^2-Mem: Reflective Experience for Memory Search
-
Fragmentation strictly raises finite-context log-loss
Effective Context in Transformers: An Analysis of Fragmentation and Tokenization
-
Planning mechanism lifts LLM graph retrieval by 18 percent
PersonalAI 2.0: Enhancing knowledge graph traversal/retrieval with planning mechanism for Personalized LLM Agents
-
OSDN preconditioner cuts recall residual 39% at 1.3B scale
OSDN: Improving Delta Rule with Provable Online Preconditioning in Linear Attention
-
Decomposed rewards boost vision-language reasoning
PDCR: Perception-Decomposed Confidence Reward for Vision-Language Reasoning
-
Memory of prior links improves biomedical entity consistency
LongBEL: Long-Context and Document-Consistent Biomedical Entity Linking
-
DRAT predicts LLMs' scientific ideation better than prior tests
Assessing the Creativity of Large Language Models: Testing, Limits, and New Frontiers
-
Cognitive folding turns event streams into proactive agent memory
CogniFold: Always-On Proactive Memory via Cognitive Folding
-
BPE dropout during pretraining improves low-resource NLP results
Pretraining Language Models with Subword Regularization: An Empirical Study of BPE Dropout in Low-Resource NLP
-
Token alignments from monolingual data speed LLM vocabulary adaptation
TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment
-
Two-stage tuning fixes LLM table errors with 1,000 examples
LIFT: Last-Mile Fine-Tuning for Table Explicitation
-
Multi-stage ranking improves checkpoint selection for multimodal LLMs
Robust Checkpoint Selection for Multimodal LLMs via Agentic Evaluation and Stability-Aware Ranking
-
Language-specific thresholds lift slur detection F1 by 2-5%
KIT-TIP-NLP at MultiPride: Continual Learning with Multilingual Foundation Model