archive
Every paper Pith has read. Search by title, abstract, or pith.
7661 papers in cs.CL · page 10
-
Entropy-gradient inversion marks stronger reasoning in LRMs
Entropy-Gradient Inversion: Moving Toward Internal Mechanism of Large Reasoning Models
-
Entropy-gradient inversion marks stronger reasoning in large models
Entropy-Gradient Inversion: Moving Toward Internal Mechanism of Large Reasoning Models
-
Mixing ICD-9 and ICD-10 data lifts rare code F1 by 27 percent
Bridging the Version Gap: Multi-version Training Improves ICD Code Prediction, Especially for Rare Codes
-
Topic models assign themes to segments
From Documents to Segments: A Contextual Reformulation for Topic Assignment
-
Distillation cuts error rates for Nigerian speech recognition by 29%
Sometin Beta Pass Notin (SBPN): Improving Multilingual ASR for Nigerian Languages via Knowledge Distillation
-
Persistent margins, not drifts, carry safety signals across LLM layers
Geometry-Lite: Interpretable Safety Probing via Layer-Wise Margin Geometry
-
LLMs mirror human power biases in simulated talks
Do LLM Agents Mirror Socio-Cognitive Effects in Power-Asymmetric Conversations?
-
LLMs copy human power dynamics in role-play dialogues
Do LLM Agents Mirror Socio-Cognitive Effects in Power-Asymmetric Conversations?
-
Gemini leads LLM benchmark on legal precedent classification
Validate Your Authority: Benchmarking LLMs on Multi-Label Precedent Treatment Classification
-
Reasoning models cut 26% tokens by exiting at semantic convergence
Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models
-
Peer editing with audio matches speech summary quality to transcripts
Beyond Transcripts: Iterative Peer-Editing with Audio Unlocks High-Quality Human Summaries of Conversational Speech
-
Causal tests select better memories for long-running AI agents
Causal Intervention-Based Memory Selection for Long-Horizon LLM Agents
-
Co-citation predictability drops over 20 years
Temporal Decay of Co-Citation Predictability: A 20-Year Statute Retrieval Benchmark from 396M Ukrainian Court Citations
-
Adversary can always reframe prompt injections as legitimate
AI Agents May Always Fall for Prompt Injections
-
Fast-slow video guardrail tops larger models at lower cost
SafeLens: Deliberate and Efficient Video Guardrails with Fast-and-Slow Screening
-
MoE models show deep-layer routing collapse for low-resource languages
Mixture of Experts for Low-Resource LLMs
-
Mu-GRPO halves LLM RL wall-clock time with stale rollouts
How Off-Policy Can GRPO Be? Mu-GRPO for Efficient LLM Reinforcement Learning
-
Small chess model tops larger ones via pattern matching
Generalization or Memorization? Brittleness Testing for Chess-Trained Language Models
-
Inverted API exploration yields verified tool-call data
Firefly: Illuminating Large-Scale Verified Tool-Call Data Generation from Real APIs
-
CausalSynth makes LLM synthetic data obey causal graphs
CasualSynth: Generating Structurally Sound Synthetic Data
-
EEG-to-text pipeline beats random baseline by 30 percent
RAG-based EEG-to-Text Translation Using Deep Learning and LLMs
-
Decomposition separates context anchors in ambiguous word embeddings
RSD: A Local Triangulation Audit Primitive for Learned Vector Blocks
-
Hybrid features raise CNN recall for Bangla fake news
Hybrid Feature Combinations with CNN for Bangla Fake News Classification
-
Verifying hypotheses attributes failures better in multi-agent LLMs
VerifyMAS: Hypothesis Verification for Failure Attribution in LLM Multi-Agent Systems
-
Tool-using AI agents can be poisoned after trust is built
Trust No Tool: Evaluating and Defending LLM Agents under Untrusted Tool Feedback
-
ContraFix fixes 84% of C/C++ vulnerabilities at low cost
ContraFix: Agentic Vulnerability Repair via Differential Runtime Evidence and Skill Reuse
-
FEA feedback lifts CAD agents past 20 percent requirement compliance
Self-Improving CAD Generation Agents with Finite Element Analysis as Feedback
-
Dynamic fixation keeps 98% OCR accuracy with 5% visual tokens
FastOCR: Dynamic Visual Fixation via KV Cache Pruning for Efficient Document Parsing
-
DiDi-Merging matches baselines at 1.24x single-model size
Dynamic Model Merging Made Slim
-
Memory layers raise repo vulnerability repair to 58%
MemRepair: Hierarchical Memory for Agentic Repository-Level Vulnerability Repair
-
ASR errors degrade Korean QA the same relative amount across LLMs
Analyzing Error Propagation in Korean Spoken QA with ASR-LLM Cascades
-
Catalogues miss 609 datasets across 53 languages
Beyond Catalogue Counts: the Dataset Visibility Asymmetry in Low-Resource Multilingual NLP
-
Text overrides images in clinical vision models
Medical Context Distorts Decisions in Clinical Vision Language Models
-
Structured evidence fusion improves biomedical QA across LLMs
BELIEF: Structured Evidence Modeling and Uncertainty-Aware Fusion for Biomedical Question Answering
-
MiniGPT hits 1.478 loss and Shakespeare dialogue
MiniGPT: Rebuilding GPT from First Principles
-
Small expert annotations calibrate LLMs to match human judgments on generative AI
QQJ: Quantifying Qualitative Judgment for Scalable and Human-Aligned Evaluation of Generative AI
-
Domain token swaps reduce training time 35-55% for LLM summarization
Learning Faster with Better Tokens: Parameter-Efficient Vocabulary Adaptation for Specialized Text Summarization
-
Five agents map news bias by exposing omissions and manipulations
NewsLens: A Multi-Agent Framework for Adversarial News Bias Navigation
-
Offline priors initialize better multi-agent LLM graphs
Learning Transferable Topology Priors for Multi-Agent LLM Collaboration Across Domains
-
Hypergraph links text levels for stronger personality prediction
HyperPersona: A Multi-Level Hypergraph Framework for Text-Based Automatic Personality Prediction
-
Multi-agent alignment lifts factual accuracy on knowledge QA
AMATA: Adaptive Multi-Agent Trajectory Alignment for Knowledge-Intensive Question Answering
-
State transitions keep recovering agents alive in LLM teams
Taming "Zombie'' Agents: A Markov State-Aware Framework for Resilient Multi-Agent Evolution
-
Decomposition separates cyclic preferences for better LLM alignment
Transitivity Meets Cyclicity: Explicit Preference Decomposition for Dynamic Large Language Model Alignment
-
Mismatched wrong drafts boost GRPO math performance
Weak-to-Strong Elicitation via Mismatched Wrong Drafts
-
Control loop raises LLM self-correction accuracy by 6.2 points
CyberCorrect: A Cybernetic Framework for Closed-Loop Self-Correction in Large Language Models
-
Context Codec verifies which commitments survive LLM context compression
Compress the Context, Keep the Commitments: A Formal Framework for Verifiable LLM Context Compression
-
ConflictRAG resolves document conflicts to raise RAG accuracy
ConflictRAG: Detecting and Resolving Knowledge Conflicts in Retrieval Augmented Generation
-
Offline sampling freezes partition function before LLM-RL policy updates
DISA: Offline Importance Sampling for Distribution-Matching LLM-RL
-
Agentic training loop lifts Lean prover to record Pass@32 scores
OProver: A Unified Framework for Agentic Formal Theorem Proving
-
Pullback Fisher metric gives closed-form optimal activation steering
FishBack: Pullback Fisher Geometry for Optimal Activation Steering in Transformers