archive

Every paper Pith has read. Search by title, abstract, or pith.

9568 papers in cs.CV · page 16

cs.CV 2026-05-18 reviewed

Agent reaches 0.90 WISE score in multi-turn image generation
Generation Navigator: A State-Aware Agentic Framework for Image Generation

Jinming Liu +4
cs.CV 2026-05-18 reviewed

Fewer semantic tokens match full multimodal performance
A More Word-like Image Tokenization for MLLMs

Hyun Lee +6
cs.CV 2026-05-18 reviewed

Adapted FamNet counts washer parts at 1.96 MAE
Counting Machine Parts

Benedict Florance Arockiaraj +3
cs.CV 2026-05-18 reviewed

Raw patches cut language bias in remote sensing vision models
SkyNative: A Native Multimodal Framework for Remote Sensing Visual Evidence Reasoning

Xiao Yang +12
cs.AI 2026-05-18 reviewed

Benchmark shows agents at 79% on game video questions vs 95% oracle
SVFSearch: A Multimodal Knowledge-Intensive Benchmark for Short-Video Frame Search in the Gaming Vertical Domain

Lingtao Mao +6

4 Piths
cs.AI 2026-05-18 reviewed

Agents reach 79% on game video frames
SVFSearch: A Multimodal Knowledge-Intensive Benchmark for Short-Video Frame Search in the Gaming Vertical Domain

Lingtao Mao +6

4 Piths
cs.CV 2026-05-18 reviewed

New UAV benchmark slashes 3D reconstruction errors by up to 84%
UAVFF3D: A Geometry-Aware Benchmark for Feed-Forward UAV 3D Reconstruction

Xiang Yang +3
cs.CV 2026-05-18 reviewed

Visual atlases evolve from trajectories to guide VLM agents
AtlasVA: Self-Evolving Visual Skill Memory for Teacher-Free VLM Agents

Pan Wang +5
cs.LG 2026-05-18 reviewed

Transient expert steers MoE updates to cut forgetting
CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning

Yang Liu +2
cs.CV 2026-05-18 reviewed

Streaming video model cuts tokens 95% with cascaded control
An Efficient Streaming Video Understanding Framework with Agentic Control

Jinming Liu +9
cs.LG 2026-05-18 reviewed

One anchor pair identifies domain transfer under Jacobian sparsity
Domain Transfer Becomes Identifiable via a Single Alignment

Sagar Shrestha +3
cs.CV 2026-05-18 reviewed

Decoupled geometry and cache yield consistent house panoramas
PanoWorld: A Generative Spatial World Model for Consistent Whole-House Panorama Synthesis

Jinrang Jia +3
cs.CV 2026-05-18 reviewed

Surgical video QA handles full procedures with temporal consolidation
SurgLQA: Scalable Long-Horizon Surgical Video Question Answering

Diandian Guo +4
cs.RO 2026-05-18 reviewed

Benchmark adds touch, RL training, and real robots to world model tests
WorldArena 2.0: Extending Embodied World Model Benchmarking on Modality, Functionality and Platform

Yu Shang +24
cs.CV 2026-05-18 reviewed

One model translates any sensor features to any other without retraining
One Model to Translate Them All: Universal Any-to-Any Translation for Heterogeneous Collaborative Perception

Yang Li +9
cs.CV 2026-05-18 reviewed

Frequency disentanglement plus geodesic matching lifts few-shot medical segmentation
Beyond Euclidean Prototypes: Spectral Disentanglement and Geodesic Matching for Few-Shot Medical Image Segmentation

Penghao Jia +6
cs.MM 2026-05-18 reviewed

Two-phase sampling matches contradictory audio prompts to video
CounterFlow: A Two-Phase Inference-Time Sampling for Counterfactual Video Foley Generation

Gyubin Lee +2
cs.CV 2026-05-18 reviewed

Mamba model beats SOTA on ECG multi-label scores
HexagonalWarriorMamba: Superior Threshold-Dependent Multi-label Classification of 12-Lead ECG Cardiac Abnormalities

Huawei Jiang +8
cs.CV 2026-05-18 reviewed

Classical SIFT beats learned descriptors on accuracy and speed
PySIFT: GPU-Resident Deterministic SIFT for Deep Learning Vision Pipelines

Sivakumar K.S. +2
cs.CV 2026-05-18 reviewed

Smartphone LiDAR sees hidden objects with motion sampling
Imaging Hidden Objects with Consumer LiDAR via Motion Induced Sampling

Siddharth Somasundaram +4
stat.ML 2026-05-18 reviewed

Girsanov weights enable unbiased resampling for diffusion models
Simple Approximation and Derivative Free Inference-Time Scaling for Diffusion Models via Sequential Monte Carlo on Path Measures

Chenyang Wang +4
cs.CV 2026-05-18 reviewed

Temporal pruning speeds video diffusion while preserving fidelity
Temporal Aware Pruning for Efficient Diffusion-based Video Generation

Sheng Li +5
cs.CV 2026-05-18 reviewed

Temporal smoothing lets pruning speed up video diffusion
Temporal Aware Pruning for Efficient Diffusion-based Video Generation

Sheng Li +5
cs.CV 2026-05-18 reviewed

Warm-up trick lets MeanFlow scale to 80B image models
Stabilizing, Scaling & Enhancing MeanFlow for Large-scale Diffusion Distillation

Xiao He +5
cs.CV 2026-05-18 reviewed

VLMs count by prior instead of image when facts clash
CounterCount: A Diagnostic Framework for Counting Bias in Vision Language Models

Reem Alzahrani +5
cs.CV 2026-05-18 reviewed

Scene understanding training produces human-like fixations in foveated model
Why We Look Where We Look: Emergent Human-like Fixations of a Foveated Visual Language Model Maximizing Scene Understanding

Shravan Murlidaran +3
cs.CV 2026-05-18 reviewed

Fourier shapes achieve 88% IR detector attack success past 25 meters
Unleashing the Representational Power of Fourier Shapes for Attacking Infrared Object Detection

Yixing Yong +4
cs.CV 2026-05-18 reviewed

Reward variance selects learnable prompts for T2I training
Curriculum Group Policy Optimization: Adaptive Sampling for Unleashing the Potential of Text-to-Image Generation

Baoteng Li +10
cs.CV 2026-05-18 reviewed

Post-hoc sphere normalization lifts long-tailed OOD AUROC
Is Complex Training Necessary for Long-Tailed OOD Detection? A Re-think from Feature Geometry

Ningkang Peng +2
cs.LG 2026-05-18 reviewed

High noisy-label accuracy fails to ensure OOD reliability
When Accuracy Is Not Enough: Uncertainty Collapse between Noisy Label Learning and Out-of-Distribution Detection

Ningkang Peng +4
cs.CV 2026-05-18 reviewed

Saliency consistency loss raises defect detection accuracy
Network Knowledge Prior Guided Learning for Data-Efficient Surface Defect Detection

Hang-Cheng Dong +3
cs.CV 2026-05-18 reviewed

LiteLoc slashes localization storage 94% and speeds pose solving 19x
Efficient Sparse-to-Dense Visual Localization via Compact Gaussian Scene Representation and Accelerated Dense Pose Estimation

Zizhuo Li +3
cs.CV 2026-05-18 reviewed

Tree constraints in training produce consistent plant skeletons
PlantPose: Universal Plant Skeleton Estimation via Tree-constrained Graph Generation

Xinpeng Liu +3
cs.CV 2026-05-18 reviewed

Framework makes one physical attack fool multiple AI vision tasks
Towards Universal Physical Adversarial Attacks via a Joint Multi-Objective and Multi-Model Optimization Framework

Ziyang Liu +6
cs.CV 2026-05-18 reviewed

Aligning latent mappings reduces inconsistency in multimodal models
LatentUMM: Dual Latent Alignment for Unified Multimodal Models

Yinyi Luo +4
cs.CV 2026-05-18 reviewed

Pixel diffusion reaches FID 1.60 at 256 resolution in 320 epochs
FrequencyBooster: Full-Frequency Modeling for High-Fidelity Pixel Diffusion

Lichen Ma +7
cs.CV 2026-05-18 reviewed

Adapter boosts Vision Transformer image quality assessment with fewer parameters
Unleashing Vision Transformer Potential In Image Quality Assessment via Global-Local Adaptive Interaction

Yu Li +5
cs.CV 2026-05-18 reviewed

Sparsity experts and distillation enable continual adaptation
MoASE++: Mixture of Activation Sparsity Experts with Domain-Adaptive On-policy Distillation for Continual Test Time Adaptation

Ronyu Zhang +10
cs.CV 2026-05-18 reviewed

Uncertainty flow plus point cloud interaction cuts hand pose error
UST-Hand: An Uncertainty-aware Spatiotemporal Point Cloud Interaction Network for 3D Self-supervised Hand Pose Estimation

Tianhao Han +7
cs.CV 2026-05-18 reviewed

Continual learning adapts X-ray models to new domains at 88.66% accuracy
Domain Incremental Learning for Pandemic-Resilient Chest X-Ray Analysis

Danu Kim
cs.CV 2026-05-18 reviewed

Prefix length turns frozen VLM embeddings into a semantic dial
GraSP-VL: Length as a Semantic Granularity Interface for Vision-Language Representations

Zesheng Li +2
cs.CV 2026-05-18 reviewed

Patch-MoE Mamba improves segmentation of polyps and skin lesions
Patch-MoE Mamba: A Patch-Ordered Mixture-of-Experts State Space Architecture for Medical Image Segmentation

Diego Adame +9
cs.CV 2026-05-17 reviewed

STDP rules deliver 78.6 percent mAP for event cameras on CPU
Brain-inspired spike-timing plasticity for reliable label-efficient event-camera vision

Mohamad Yazan Sadoun +2
cs.CV 2026-05-17 reviewed

1D-2D CNN fusion with attention hits 99-100% on ECG identification
Attention-Guided Fusion of 1D and 2D CNNs for Robust ECG-Based Biometric Recognition

Arioua +7
cs.CV 2026-05-17 reviewed

4D Gaussians let you query driving scenes at any future time
GEM: Gaussian Evolution Model for Occupancy Forecasting and Motion Planning

Cheng Chen +2
cs.CV 2026-05-17 reviewed

Sobel edges match finger knuckles at 17% rate
A simple approach for biometrics: Finger-knuckle prints recognition based on a Sobel filter and similarity measures

E. O. Rodrigues +3
cs.CV 2026-05-17 reviewed

Deep learning cuts pathology slide file sizes 43-80 percent
Deep learning-based compression of giga-resolution whole slide images

Maren H{\o}ib{\o} +4
cs.RO 2026-05-17 reviewed

Monocular RGB+IMU matches RGB-D accuracy for indoor scene graphs
Mono-Hydra++: Real-Time Monocular Scene Graph Construction with Multi-Task Learning for 3D Indoor Mapping

U. V. B. L. Udugama +2
cs.IR 2026-05-17 reviewed

Three-stage pipeline lifts video RAG retrieval from 0.195 to 0.759 nDCG
MARQUIS: A Three-Stage Pipeline for Video Retrieval-Augmented Generation

Debashish Chakraborty +9
cs.CV 2026-05-17 reviewed

System maps hand contacts to surfaces in operating rooms
TouchMap-OR: Multi-View 3D Mapping of Hand-Surface Contacts

Sophokles Ktistakis +3