archive
Every paper Pith has read. Search by title, abstract, or pith.
7661 papers in cs.CL · page 14
-
Ukrainian court citations form unsupervised legal ontology
Automatic Construction of a Legal Citation Graph from 100 Million Ukrainian Court Decisions: Large-Scale Extraction, Topological Analysis, and Ontology-Driven Clustering
-
Agent turns I/O examples into code via guided evolutionary search
From I/O to Code with Discovery Agent
-
LaMR prunes code context to save 31% tokens while matching full performance
Context Pruning for Coding Agents via Multi-Rubric Latent Reasoning
-
New tool opens discourse data across 16 languages for local use
DiscoExplorer: An Open Interface for the Study of Multilingual Discourse Relations
-
Human video builds physical smarts for top robot policies
PhysBrain 1.0 Technical Report
-
Natural literary translations often drift from the original meaning
Fluency and Faithfulness in Human and Machine Literary Translation
-
One token unifies agentic and latent visual reasoning
ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both
-
FutureSim shows top AI agents predict events at 25% accuracy
FutureSim: Replaying World Events to Evaluate Adaptive Agents
-
Grep beats vector search in most agentic tasks
Is Grep All You Need? How Agent Harnesses Reshape Agentic Search
-
Length alone triggers LLM backdoors to leak secrets
MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs
-
EHR tables sharpen timing in text-based clinical timelines
Text Knows What, Tables Know When: Clinical Timeline Reconstruction via Retrieval-Augmented Multimodal Alignment
-
Memory model lets LLMs add new knowledge without retraining
MeMo: Memory as a Model
-
Memory model lets LLMs add knowledge without retraining
MeMo: Memory as a Model
-
The paper builds a 507-leaf taxonomy of LLM inference attacks from 932 recent security…
Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks
-
Framework converts text tool benchmarks to audio for voice agents
From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents
-
The paper presents a framework that converts existing text-based tool-calling benchmarks…
From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents
-
Open framework lifts coding agent to 67.5% on SWE-bench
Orchard: An Open-Source Agentic Modeling Framework
-
128 random demos suffice for strong RLVR results
Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance
-
Window-level RL raises speculative decoding acceptance to 6.5
Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Windowing
-
Token counts for Ukrainian legal text differ 1.6 times by model
Tokenizer Fertility and Zero-Shot Performance of Foundation Models on Ukrainian Legal Text: A Comparative Study
-
Decomposing traces boosts AI agent diagnosis accuracy up to 12x
Holistic Evaluation and Failure Diagnosis of AI Agents
-
CIR benchmarks let models solve most queries with one modality
Do Composed Image Retrieval Benchmarks Require Multimodal Composition?
-
Graph paths verify legal reasoning in Indian court AI
Falkor-IRAC: Graph-Constrained Generation for Verified Legal Reasoning in Indian Judicial AI
-
Internal masking cuts hallucinations in vision-language models
Do We Really Need External Tools to Mitigate Hallucinations? SIRA: Shared-Prefix Internal Reconstruction of Attribution
-
Terminal anchors extend LLM context to 64K from short sequences
EndPrompt: Efficient Long-Context Extension via Terminal Anchoring
-
Denoising paths supply low-cost uncertainty scores for language diffusion models
Uncertainty Quantification for Large Language Diffusion Models
-
ML classifier beats rules at spotting BDD refactoring chances
Mining Subscenario Refactoring Opportunities in Behaviour-Driven Software Test Suites: ML Classifiers and LLM-Judge Baselines
-
Memory agent keeps repo documentation consistent
Remember Your Trace: Memory-Guided Long-Horizon Agentic Framework for Consistent and Hierarchical Repository-Level Code Documentation
-
Action tokens carry the training signal in agentic RL
Resolving Action Bottleneck: Agentic Reinforcement Learning Informed by Token-Level Energy
-
CIPO turns LLM failures into better reasoning
Learning from Failures: Correction-Oriented Policy Optimization with Verifiable Rewards
-
Optimal control view yields language models with both fidelity and parallel speed
Language Generation as Optimal Control: Closed-Loop Diffusion in Latent Control Space
-
Optimal control reformulation gives language models fast parallel sampling at high quality
Language Generation as Optimal Control: Closed-Loop Diffusion in Latent Control Space
-
Many perfect LLM scores hide dimensional intent failures
Dimension-Level Intent Fidelity Evaluation for Large Language Models: Evidence from Structured Prompt Ablation
-
LLM memory systems hit only 46% on group conversations
GroupMemBench: Benchmarking LLM Agent Memory in Multi-Party Conversations
-
Group chats expose limits of LLM agent memory
GroupMemBench: Benchmarking LLM Agent Memory in Multi-Party Conversations
-
Ming glossaries used flexible Chinese characters to approximate foreign sounds
Cross-Linguistic Transcription and Phonological Representation in the Hu\`it\'onggu\v{a}nx\`i Hu\'ay\'iy\`iy\v{u}
-
Stale code snippets make models output outdated helpers
When Retrieval Hurts Code Completion: A Diagnostic Study of Stale Repository Context
-
RAG follows conflicting context over its own knowledge
Does RAG Know When Retrieval Is Wrong? Diagnosing Context Compliance under Knowledge Conflict
-
Probe shows RAG follows wrong context in 85 percent of conflict cases
Does RAG Know When Retrieval Is Wrong? Diagnosing Context Compliance under Knowledge Conflict
-
Guardrails adapt from sparse noisy failures via conservative induction
LiSA: Lifelong Safety Adaptation via Conservative Policy Induction
-
Orthogonal projection isolates hallucination signals in LLM answers
When Answers Stray from Questions: Hallucination Detection via Question-Answer Orthogonal Decomposition
-
Adaptive gate skips reasoning for simple multimodal inputs
Think When Needed: Adaptive Reasoning-Driven Multimodal Embeddings with a Dual-LoRA Architecture
-
Calculus finds optimal vocabulary size for ASR
A Calculus-Based Framework for Determining Vocabulary Size in End-to-End ASR
-
Agents resolve 45 percent of chained package upgrades
SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades
-
New scores track whether unlearning works across languages
Knowledge Beyond Language: Bridging the Gap in Multilingual Machine Unlearning Evaluation
-
Three-tier memory lifts recommender hit rate by 26 percent
Agentic Recommender System with Hierarchical Belief-State Memory
-
Three-tier memory raises recommender hit rate 26 percent
Agentic Recommender System with Hierarchical Belief-State Memory
-
Synthetic queries expose five times more LLM failures
NodeSynth: Socially Aligned Synthetic Data for AI Evaluation
-
Synthetic queries trigger up to 5x higher LLM failure rates
NodeSynth: Socially Aligned Synthetic Data for AI Evaluation
-
Synthetic augmentation lifts defense classification to 58% accuracy
Mitigating Data Scarcity in Psychological Defense Classification with Context-Aware Synthetic Augmentation