archive

Every paper Pith has read. Search by title, abstract, or pith.

9568 papers in cs.CV · page 4

cs.CV 2026-05-21 reviewed

Synthetic RAW data yields same low-light detection metrics as real
Making the Discrete Continuous: Synthetic RAW Augmentations for Fine-Grained Evaluation of Person Detection Performance in Low Light

Valeria Pais +5
cs.CV 2026-05-21 reviewed

Pre-VLA lifts VLA success rates from 31% to 38%
Pre-VLA: Preemptive Runtime Verification for Reliable Vision-Language-Action and World-Model Rollouts

Zhen Sun +8
eess.IV 2026-05-21 reviewed

Block-sparse model separates rPPG signals from video noise
Time-varying rPPG signal separation via block-sparse signal model

Kosuke Kurihara +3
cs.CV 2026-05-21 reviewed

Dual-shutter pairs invert motion blur and distortion
Moment-Reenacting: Inverse Motion Degradation with Cross-shutter Guidance

Xiang Ji +4
cs.CV 2026-05-21 reviewed

Paired blur and distortion images recover high-speed motion
Moment-Reenacting: Inverse Motion Degradation with Cross-shutter Guidance

Xiang Ji +4
cs.CV 2026-05-21 reviewed

Table structure recovered by predicting grid counts and separators directly
FastTab: A Fast Table Recognizer with a Tiny Recursive Module and 1D Transformers

Laziz Hamdi +3
cs.CV 2026-05-21 reviewed

GenRe generalizes 3D urban scenes to new viewpoints in minutes
Diffusion-guided Generalizable Enhancer for Urban Scene Reconstruction

Henry Che +5
cs.CV 2026-05-21 reviewed

Explicit baseline fixes attribution errors in neural explanations
The Neglected Baseline in Model Interpretation

Yongjin Cui +1
cs.CV 2026-05-21 reviewed

New benchmark and GRPO method lift MLLMs past proprietary models on receipt reasoning
From Recognition to Reasoning: Benchmarking and Enhancing MLLMs on Real-World Receipt Document Understanding

Yandi Wang +7
cs.CV 2026-05-21 reviewed

Lesion grounding lifts ophthalmic VQA accuracy and clarity
Towards Clinically Interpretable Ophthalmic VQA via Spatially-Grounded Lesion Evidence

Xingyue Wang +6
cs.CV 2026-05-21 reviewed

LLMs recognize activities from muscle signals after language mapping
Translating Signals to Languages for sEMG-Based Activity Recognition

Ming Wang +5
cs.CV 2026-05-21 reviewed

Benchmark shows AI unreliable on agricultural tool tasks
AgroTools: A Benchmark for Tool-Augmented Multimodal Agents in Agriculture

Zi Ye +12
cs.CV 2026-05-21 reviewed

3D eye prior synthesizes training data for any new AR/VR tracker
GazePrior: Zero-Shot AR/VR Eye Tracking via Learned 3D Gaze Reconstruction

Corentin Dumery +5
cs.CV 2026-05-21 reviewed

Multiple metrics needed to judge liver vessel segmentation
VEELA: A Clinically-Constrained Benchmark for Liver Vessel Segmentation in Computed Tomography Angiography

Ziya Ata Yaz{\i}c{\i} +21
cs.CV 2026-05-21 reviewed

QuantSR+ raises 2-bit SR accuracy by 0.29 dB while cutting ops 87.9%
QuantSR+: Pushing the Limit of Quantized Image Super-Resolution Networks

Haotong Qin +6
cs.CV 2026-05-21 reviewed

MLLM planner in ViT space guides DiT to SOTA video generation and edits
Bernini: Latent Semantic Planning for Video Diffusion

Bernini Team: Chenchen Liu +10
cs.CV 2026-05-21 reviewed

Watermarking 4D splats by gating at motion-curvature instants
4D-GSW: Kinematic-Aware Spatio-Temporal Consistent Watermarking for 4D Gaussian Splatting

Sifan Zhou +3
cs.CV 2026-05-21 reviewed

Multispectral LiDAR lifts 3D land cover mIoU by up to 7.8 points
3D LULC classification using multispectral LiDAR and deep learning: current and prospective schemes

Narges Takhtkeshha +5
cs.CV 2026-05-21 reviewed

K-space hybrid model holds up better for breast lesion segmentation under acceleration
Robustness of breast lesion segmentation under MRI undersampling improves with k-space-aware deep learning

Lukas T. Rotkopf +5
cs.CV 2026-05-21 reviewed

Anchor swaps erase specific identities from face generators
PIU: Proximity-guided Identity Unlearning in ID-Conditioned Diffusion Models

Jose Edgar Hernandez Cancino Estrada +5
cs.RO 2026-05-21 reviewed

DEVO exports sparse point clouds matching EMVS at 5 cm
Extending Deep Event Visual Odometry with Sparse Point-Cloud Export

Alireza Safdari +1
cs.CV 2026-05-21 reviewed

YOLOv2 with FPN and switchable convolution hits 68% mAP on virus patches
Detection of Virus and Small Cell Patches in Foci Images Using Switchable Convolution and Feature Pyramid Networks

Amrita Singh +1
cs.CV 2026-05-21 reviewed

Curved fractal patches fool VIS-IR VLMs
Exposing Vulnerabilities in Visible-Infrared VLMs: A Unified Geometric Adversarial Framework with Cross-Task Transferability

Xiang Chen +8
cs.RO 2026-05-21 reviewed

4D trajectories and sparse tracking enable zero-shot robot-object tasks
Imagine2Real: Towards Zero-shot Humanoid-Object Interaction via Video Generative Priors

Jiahe Chen +9
cs.RO 2026-05-21 reviewed

Sparse keypoints in behavior model enable zero-shot humanoid interactions
Imagine2Real: Towards Zero-shot Humanoid-Object Interaction via Video Generative Priors

Jiahe Chen +9
cs.CV 2026-05-21 reviewed

Multi-grained compression lifts long video QA accuracy
MuKV: Multi-Grained KV Cache Compression for Long Streaming Video Question-Answering

Junbin Xiao +4
cs.NI 2026-05-21 reviewed

YOLOv8 recall falls below 40% under strong turbulence in satellite images
Impact of Atmospheric Turbulence and Pointing Error on Earth Observation

Celia S\'anchez-de-Miguel +4
cs.LG 2026-05-21 reviewed

Evidence hierarchy lifts Bayesian threat classification to 95%
An Evidence Hierarchy for Bayesian Object Classification via OSINT-Aided Heterogeneous Sensor Fusion

Jan Nausner +1
cs.CV 2026-05-21 reviewed

OMR tops matched music score search
Direct content-based retrieval from music scores images

Noelia Luna-Barahona +4
cs.CV 2026-05-21 reviewed

Graphs plus diffusion improve tumor segmentation with missing MRI scans
D3Seg: Dependency-Aware Diffusion for Brain Tumor Segmentation with Missing Modalities

Danish Ali +3
cs.CV 2026-05-21 reviewed

Model recovers 3D hand poses from distant room corners
REACH: Hand Pose Estimation from Room Corners

Shu Nakamura +6
cs.CV 2026-05-21 reviewed

Semi-supervised UniMatch V2 segments weather-degraded images
A Robust Semantic Segmentation Pipeline for the CVPR 2026 8th UG2+ Challenge Track 2

Jinming Chai +3
cs.CV 2026-05-21 reviewed

Semi-supervised training lifts segmentation in bad weather
A Robust Semantic Segmentation Pipeline for the CVPR 2026 8th UG2+ Challenge Track 2

Jinming Chai +3
cs.CV 2026-05-21 reviewed

Anatomy residual pathway lifts VCE mAP to 0.3409
GALAR-TemporalNet v2: Anatomy-Guided Dual-Branch Temporal Classification with Bidirectional Mamba and Dual-Graph GCN for Video Capsule Endoscopy -- after competition results

Jiye Won (1) +2
cs.CV 2026-05-21 reviewed

Self-evolving pool optimizes image restoration agent
EvoIR-Agent: Self-Evolving Image Restoration Agentic System via Experience-Driven Learning

Kailin Zhuang +2
cs.CV 2026-05-21 reviewed

Text from LLMs guides zero-shot action localization in videos
Zero-Shot Temporal Action Localization Through Textual Guidance

Benedetta Liberatori +5
cs.CV 2026-05-21 reviewed

Video models top open suturing skill challenge
OSS: Open Suturing Skills Vision-Based Assessment Challenge 2024-2025

Hanna Hoffmann +56
cs.CV 2026-05-21 reviewed

Graph of patches cuts UHD quality prediction error
Ultra-High-Definition Image Quality Assessment via Graph Representation Learning

Shaode Yu +6
cs.CV 2026-05-21 reviewed

Feed-forward model reconstructs 4D scenes without camera poses
No Pose, No Problem in 4D: Feed-Forward Dynamic Gaussians from Unposed Multi-View Videos

Matteo Balice +5
cs.CV 2026-05-21 reviewed

Events and illumination collaborate to fix low-light photos
Event-Illumination Collaborative Low-light Image Enhancement with a High-resolution Real-world Dataset

Senyan Xu +7
cs.CV 2026-05-21 reviewed

Telematics and CV fusion boosts MLLM safety event detection
Enhancing Multimodal Large Language Models for Safety-Critical Driving Video Analysis

Tomaso Trinci +2
cs.CV 2026-05-21 reviewed

Hybrid sampling beats pure uncertainty or diversity in active learning
Balancing Uncertainty and Diversity of Samples: Leveraging Diversity of Least, High Confidence Samples for Effective Active Learning

Vipul Arya +4
cs.AI 2026-05-21 reviewed

Dual selection prunes video tokens while keeping static scenes and changes
ST-SimDiff: Balancing Spatiotemporal Similarity and Difference for Efficient Video Understanding with MLLMs

Bingjun Luo +3
cs.CV 2026-05-21 reviewed

FlowGS speeds up continuous-scale super-resolution for remote sensing
Flow-based Gaussian Splatting for Continuous-Scale Remote Sensing Image Super-Resolution

Jiangwei Mo +2
cs.CV 2026-05-21 reviewed

One sentence becomes a full short drama with AI agents
One Sentence, One Drama: Personalized Short-Form Drama Generation via Multi-Agent Systems

Yufei Shi +7
cs.CV 2026-05-21 reviewed

Event cameras match RGB gait ID in light
EventGait: Towards Robust Gait Recognition with Event Streams

Senyan Xu +6
cs.CV 2026-05-21 reviewed

Swapping ViT attention heads for depthwise convolutions speeds inference 17-20%
Accelerating Vision Foundation Models with Drop-in Depthwise Convolution

Carmelo Scribano +7
cs.CV 2026-05-21 reviewed

Two-stage AI plans then executes fixes for photo flaws
AesFormer: Transform Everyday Photos into Beautiful Memories

Tianxiang Du +2
cs.CV 2026-05-21 reviewed

Diffusion models correct motion in 3D brain MRI
MotionDPS: Motion-Compensated 3D Brain MRI Reconstruction

Antonio Ortiz-Gonzalez +3
cs.AI 2026-05-21 reviewed

MLLMs get personality scores right but ignore video cues half the time
Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?

Caixin Kang +10