archive
Every paper Pith has read. Search by title, abstract, or pith.
14513 papers in cs.AI · page 6
-
9B model with skill modules beats 32B LLM
Skill Weaving: Efficient LLM Improvement via Modular Skillpacks
-
Video models top open suturing skill challenge
OSS: Open Suturing Skills Vision-Based Assessment Challenge 2024-2025
-
Visual primitives raise robot pick-and-place success by 27%
Action with Visual Primitives
-
LLM recall tracks paper citations across 15 models
LLM-Metrics: Measuring Research Impact Through Large Language Model Memory
-
LLMs verify only 10% of test suites on code mutations
SWE-Mutation: Can LLMs Generate Reliable Test Suites in Software Engineering?
-
Metric shows VLM explainers miss text synergy
Measuring Cross-Modal Synergy: A Benchmark for VLM Explainability
-
Fixed harness lifts 116 of 126 LLM agent settings without model changes
Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents
-
Dual selection prunes video tokens while keeping static scenes and changes
ST-SimDiff: Balancing Spatiotemporal Similarity and Difference for Efficient Video Understanding with MLLMs
-
OWPO lets LLMs self-evolve without fixed references
One-Way Policy Optimization for Self-Evolving LLMs
-
IdleSpec converts agent wait time into better plans
IdleSpec: Exploiting Idle Time via Speculative Planning for LLM Agents
-
Hygiene rules enable LLM agents to self-improve skills effectively
Ratchet: A Minimal Hygiene Recipe for Self-Evolving LLM Agents
-
Learned transfer keeps relevant facts in long-term KG memory
Short-Term-to-Long-Term Memory Transfer for Knowledge Graphs under Partial Observability
-
30B agents rival 1T models with 25-95% fewer tokens
Efficient Agentic Reasoning Through Self-Regulated Simulative Planning
-
Three-view pretraining lifts protein structure prediction
Atom-level Protein Representation Learning Improves Protein Structure Prediction
-
Three-view token recovery lifts protein structure tasks
Atom-level Protein Representation Learning Improves Protein Structure Prediction
-
Physical objects flip trust to exclude benign vehicles from perception
Adversarial Trust Poisoning in Vehicular Collaborative Perception
-
MLLMs get personality scores right but ignore video cues half the time
Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?
-
Tree-aware KV eviction cuts memory 4x for LLM reasoning
ArborKV: Structure-Aware KV Cache Management for Scaling Tree-based LLM Reasoning
-
ExComm resolves agent conflicts to improve test-time scaling
ExComm: Exploration-Stage Communication for Error-Resilient Agentic Test-Time Scaling
-
Benchmark reveals limits in multi-page document parsing
MPDocBench-Parse: Benchmarking Practical Multi-page Document Parsing
-
Text embeddings boost ImageNet accuracy by up to 2.7 points
TextTeacher: What Can Language Teach About Images?
-
Humans beat LLMs in Colonel Blotto tournaments
Not Yet: Humans Outperform LLMs in a Colonel Blotto Tournament
-
Ontological continuum unifies knowledge graph modeling
Knowledge Graph Re-engineering Along the Ontological Continuum (extended version)
-
Camera cooperation cuts UAV beam steering overhead by 71 percent
A Camera-Cooperative ISAC Framework for Multimodal Non-Cooperative UAVs Sensing
-
Latent future scenes improve VLA driving over pixel reconstruction
LVDrive: Latent Visual Representation Enhanced Vision-Language-Action Autonomous Driving Model
-
General models gain far more from images than medical ones in licensing exams
JMed48k: A Multi-Profession Japanese Medical Licensing Benchmark for Vision-Language Model Evaluation
-
Training-free pooling lifts Video LLM accuracy without retraining
Enhancing Visual Token Representations for Video Large Language Models via Training-Free Spatial-Temporal Pooling and Gridding
-
Subproblem curriculum RL improves LLM math reasoning by 4.1 points
From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning
-
Framework turns 2D heart ultrasounds into accurate 4D models
Echo4DIR: 4D Implicit Heart Reconstruction from 2D Echocardiography Videos
-
WaveGuard perturbs API images to block model distillation
Safeguarding Text-to-Image Generative Models Against Unauthorized Knowledge Distillation
-
Prototype stages top time series accuracy on 80 of 128 UCR datasets
Prototype-Guided Classification Sub-Task Decoupling Framework: Enhancing Generalization and Interpretability for Multivariate Time Series
-
LLM diagnostic accuracy drops 13% in interactive settings
Active Evidence-Seeking and Diagnostic Reasoning in Large Language Models for Clinical Decision Support
-
Framework computes determinants securely on edge servers
Secure and Parallel Determinant Computation for Large-Scale Matrices in Edge Environments
-
BEV maps from RGB-D cut tokens yet raise VLN success rates
GA-VLN: Geometry-Aware BEV Representation for Efficient Vision-Language Navigation
-
AgroVG benchmark shows top models at 0.35 Set-F1 on farm targets
AgroVG: A Large-Scale Multi-Source Benchmark for Agricultural Visual Grounding
-
Dataset records real flooded roads for self-driving cars
FRED: A Multi-Modal Autonomous Driving Dataset for Flooded Road Environments
-
Five lines of code expose an LLM's hidden vocabulary secrets
Check Your LLM's Secret Dictionary! Five Lines of Code Reveal What Your LLM Learned (Including What It Shouldn't Have)
-
RoBERTa reaches 93 percent accuracy on IMDb sentiment task
From TF-IDF to Transformers: A Comparative and Ensemble Approach to Sentiment Classification
-
Camouflaged attacks slash LLM guard detection from 94% to 10%
Blind Spots in the Guard: How Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems
-
Method turns BIT phase volumes into realistic 3D H&E stains
Virtual 3D H&E Staining from Phase-contrast Back-illumination Interference Tomography
-
Event log becomes the agent for replayable and forkable runs
The Log is the Agent: Event-Sourced Reactive Graphs for Auditable, Forkable Agentic Systems
-
Patch-guided trajectories raise SWE agent fixes by 10.8 points at 15% lower cost
From Patches to Trajectories: Privileged Process Supervision for Software-Engineering Agents
-
Auditable encoder reveals semantic nodes are structurally disconnected
Ex-GraphRAG: Interpretable Evidence Routing for Graph-Augmented LLMs
-
Coupled optimization yields verifiable evidence in rankings
ECPO: Evidence-Coupled Policy Optimization for Evidence-Certified Candidate Ranking
-
Counterfactual RL raises video LLM dynamic accuracy
Learning Spatiotemporal Sensitivity in Video LLMs via Counterfactual Reinforcement Learning
-
User refinements raise code agent acceptance from 25.7% to 35.7%
Echo: Learning from Experience Data via User-Driven Refinement
-
LVLMs collect emotional cues in middle layers then translate in deep layers
Interpreting and Enhancing Emotional Circuits in Large Vision-Language Models via Cross-Modal Information Flow
-
Video frames close the detection gap between AI images and videos
Video as Natural Augmentation: Towards Unified AI-Generated Image and Video Detection
-
Mismatched schema and CSV format can cut facts below baseline in table-to-graph extraction
Format-Constraint Coupling in Knowledge Graph Construction from Statistical Tables
-
LLM semantic retrieval raises ad recommendation stability
LLM Retrieval for Stable and Predictable Ad Recommendations