archive

Every paper Pith has read. Search by title, abstract, or pith.

14513 papers in cs.AI · page 6

cs.AI 2026-05-21 reviewed

9B model with skill modules beats 32B LLM
Skill Weaving: Efficient LLM Improvement via Modular Skillpacks

Zhuo Li +7
cs.CV 2026-05-21 reviewed

Video models top open suturing skill challenge
OSS: Open Suturing Skills Vision-Based Assessment Challenge 2024-2025

Hanna Hoffmann +56
cs.RO 2026-05-21 reviewed

Visual primitives raise robot pick-and-place success by 27%
Action with Visual Primitives

Weilong Guo +8
cs.AI 2026-05-21 reviewed

LLM recall tracks paper citations across 15 models
LLM-Metrics: Measuring Research Impact Through Large Language Model Memory

Si Shen +2
cs.SE 2026-05-21 reviewed

LLMs verify only 10% of test suites on code mutations
SWE-Mutation: Can LLMs Generate Reliable Test Suites in Software Engineering?

Yuxuan Sun +8
cs.AI 2026-05-21 reviewed

Metric shows VLM explainers miss text synergy
Measuring Cross-Modal Synergy: A Benchmark for VLM Explainability

Jo\"el Roman Ky +2
cs.AI 2026-05-21 reviewed

Fixed harness lifts 116 of 126 LLM agent settings without model changes
Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents

Tianshi Xu +2
cs.AI 2026-05-21 reviewed

Dual selection prunes video tokens while keeping static scenes and changes
ST-SimDiff: Balancing Spatiotemporal Similarity and Difference for Efficient Video Understanding with MLLMs

Bingjun Luo +3
cs.LG 2026-05-21 reviewed

OWPO lets LLMs self-evolve without fixed references
One-Way Policy Optimization for Self-Evolving LLMs

Shuo Yang +8
cs.AI 2026-05-21 reviewed

IdleSpec converts agent wait time into better plans
IdleSpec: Exploiting Idle Time via Speculative Planning for LLM Agents

Daewon Choi +6
cs.AI 2026-05-21 reviewed

Hygiene rules enable LLM agents to self-improve skills effectively
Ratchet: A Minimal Hygiene Recipe for Self-Evolving LLM Agents

Xing Zhang +6
cs.LG 2026-05-21 reviewed

Learned transfer keeps relevant facts in long-term KG memory
Short-Term-to-Long-Term Memory Transfer for Knowledge Graphs under Partial Observability

Taewoon Kim +2
cs.AI 2026-05-21 reviewed

30B agents rival 1T models with 25-95% fewer tokens
Efficient Agentic Reasoning Through Self-Regulated Simulative Planning

Mingkai Deng +6
q-bio.BM 2026-05-21 reviewed

Three-view pretraining lifts protein structure prediction
Atom-level Protein Representation Learning Improves Protein Structure Prediction

Taewon Kim +8
q-bio.BM 2026-05-21 reviewed

Three-view token recovery lifts protein structure tasks
Atom-level Protein Representation Learning Improves Protein Structure Prediction

Taewon Kim +8
cs.CR 2026-05-21 reviewed

Physical objects flip trust to exclude benign vehicles from perception
Adversarial Trust Poisoning in Vehicular Collaborative Perception

Yutong Liu +3
cs.AI 2026-05-21 reviewed

MLLMs get personality scores right but ignore video cues half the time
Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?

Caixin Kang +10
cs.AI 2026-05-21 reviewed

Tree-aware KV eviction cuts memory 4x for LLM reasoning
ArborKV: Structure-Aware KV Cache Management for Scaling Tree-based LLM Reasoning

Yeqiu Chen +5
cs.AI 2026-05-21 reviewed

ExComm resolves agent conflicts to improve test-time scaling
ExComm: Exploration-Stage Communication for Error-Resilient Agentic Test-Time Scaling

Woomin Song +6
cs.AI 2026-05-21 reviewed

Benchmark reveals limits in multi-page document parsing
MPDocBench-Parse: Benchmarking Practical Multi-page Document Parsing

Bangbang Zhou +10
cs.CV 2026-05-21 reviewed

Text embeddings boost ImageNet accuracy by up to 2.7 points
TextTeacher: What Can Language Teach About Images?

Tobias Christian Nauen +5
econ.GN 2026-05-21 reviewed

Humans beat LLMs in Colonel Blotto tournaments
Not Yet: Humans Outperform LLMs in a Colonel Blotto Tournament

Dmitry Dagaev +4
cs.AI 2026-05-21 reviewed

Ontological continuum unifies knowledge graph modeling
Knowledge Graph Re-engineering Along the Ontological Continuum (extended version)

Enrico Daga +2
cs.AI 2026-05-21 reviewed

Camera cooperation cuts UAV beam steering overhead by 71 percent
A Camera-Cooperative ISAC Framework for Multimodal Non-Cooperative UAVs Sensing

Wenfeng Wu +2
cs.CV 2026-05-21 reviewed

Latent future scenes improve VLA driving over pixel reconstruction
LVDrive: Latent Visual Representation Enhanced Vision-Language-Action Autonomous Driving Model

Xiaodong Mei +5
cs.CV 2026-05-21 reviewed

General models gain far more from images than medical ones in licensing exams
JMed48k: A Multi-Profession Japanese Medical Licensing Benchmark for Vision-Language Model Evaluation

Yue Xun +12
cs.AI 2026-05-21 reviewed

Training-free pooling lifts Video LLM accuracy without retraining
Enhancing Visual Token Representations for Video Large Language Models via Training-Free Spatial-Temporal Pooling and Gridding

Bingjun Luo +3
cs.LG 2026-05-21 reviewed

Subproblem curriculum RL improves LLM math reasoning by 4.1 points
From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning

Xitai Jiang +5
cs.CV 2026-05-21 reviewed

Framework turns 2D heart ultrasounds into accurate 4D models
Echo4DIR: 4D Implicit Heart Reconstruction from 2D Echocardiography Videos

Yanan Liu +7
cs.CR 2026-05-21 reviewed

WaveGuard perturbs API images to block model distillation
Safeguarding Text-to-Image Generative Models Against Unauthorized Knowledge Distillation

Yilan Gao +3
cs.LG 2026-05-21 reviewed

Prototype stages top time series accuracy on 80 of 128 UCR datasets
Prototype-Guided Classification Sub-Task Decoupling Framework: Enhancing Generalization and Interpretability for Multivariate Time Series

Xianhao Song +4
cs.AI 2026-05-21 reviewed

LLM diagnostic accuracy drops 13% in interactive settings
Active Evidence-Seeking and Diagnostic Reasoning in Large Language Models for Clinical Decision Support

Chen Zhan +10
cs.DC 2026-05-21 reviewed

Framework computes determinants securely on edge servers
Secure and Parallel Determinant Computation for Large-Scale Matrices in Edge Environments

Prajwal Panth
cs.CV 2026-05-21 reviewed

BEV maps from RGB-D cut tokens yet raise VLN success rates
GA-VLN: Geometry-Aware BEV Representation for Efficient Vision-Language Navigation

Jiahao Yang +6
cs.CV 2026-05-21 reviewed

AgroVG benchmark shows top models at 0.35 Set-F1 on farm targets
AgroVG: A Large-Scale Multi-Source Benchmark for Agricultural Visual Grounding

Haocheng Li +7
cs.CV 2026-05-21 reviewed

Dataset records real flooded roads for self-driving cars
FRED: A Multi-Modal Autonomous Driving Dataset for Flooded Road Environments

Connor Malone +2
cs.LG 2026-05-21 reviewed

Five lines of code expose an LLM's hidden vocabulary secrets
Check Your LLM's Secret Dictionary! Five Lines of Code Reveal What Your LLM Learned (Including What It Shouldn't Have)

Hisashi Miyashita
cs.CL 2026-05-21 reviewed

RoBERTa reaches 93 percent accuracy on IMDb sentiment task
From TF-IDF to Transformers: A Comparative and Ensemble Approach to Sentiment Classification

Dip Biswas Shanto +3
cs.CR 2026-05-21 reviewed

Camouflaged attacks slash LLM guard detection from 94% to 10%
Blind Spots in the Guard: How Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems

Aaditya Pai
cs.CV 2026-05-21 reviewed

Method turns BIT phase volumes into realistic 3D H&E stains
Virtual 3D H&E Staining from Phase-contrast Back-illumination Interference Tomography

Anthony Song +5
cs.AI 2026-05-21 reviewed

Event log becomes the agent for replayable and forkable runs
The Log is the Agent: Event-Sourced Reactive Graphs for Auditable, Forkable Agentic Systems

Yohei Nakajima
cs.SE 2026-05-21 reviewed

Patch-guided trajectories raise SWE agent fixes by 10.8 points at 15% lower cost
From Patches to Trajectories: Privileged Process Supervision for Software-Engineering Agents

Murong Ma +9
cs.LG 2026-05-21 reviewed

Auditable encoder reveals semantic nodes are structurally disconnected
Ex-GraphRAG: Interpretable Evidence Routing for Graph-Augmented LLMs

Yoav Kor Sade +4
cs.AI 2026-05-21 reviewed

Coupled optimization yields verifiable evidence in rankings
ECPO: Evidence-Coupled Policy Optimization for Evidence-Certified Candidate Ranking

Miaobo Hu +7
cs.CV 2026-05-21 reviewed

Counterfactual RL raises video LLM dynamic accuracy
Learning Spatiotemporal Sensitivity in Video LLMs via Counterfactual Reinforcement Learning

Dazhao Du +9
cs.AI 2026-05-21 reviewed

User refinements raise code agent acceptance from 25.7% to 35.7%
Echo: Learning from Experience Data via User-Driven Refinement

Hande Dong +17
cs.CV 2026-05-21 reviewed

LVLMs collect emotional cues in middle layers then translate in deep layers
Interpreting and Enhancing Emotional Circuits in Large Vision-Language Models via Cross-Modal Information Flow

Chengsheng Zhang +3
cs.CV 2026-05-21 reviewed

Video frames close the detection gap between AI images and videos
Video as Natural Augmentation: Towards Unified AI-Generated Image and Video Detection

Zhengcen Li +6
cs.AI 2026-05-21 reviewed

Mismatched schema and CSV format can cut facts below baseline in table-to-graph extraction
Format-Constraint Coupling in Knowledge Graph Construction from Statistical Tables

Jingxuan Qi +2
cs.IR 2026-05-21 reviewed

LLM semantic retrieval raises ad recommendation stability
LLM Retrieval for Stable and Predictable Ad Recommendations

Vinodh Kumar Sunkara +15