archive

Every paper Pith has read. Search by title, abstract, or pith.

9568 papers in cs.CV · page 11

cs.CV 2026-05-19 reviewed

Real images align diffusion models as well as preference pairs
When Preference Labels Fall Short: Aligning Diffusion Models from Real Data

Weiyan Chen +7
cs.CV 2026-05-19 reviewed

Dual-stream network lifts weather detection at full speed
CADENet: Condition-Adaptive Asynchronous Dual-Stream Enhancement Network for Adverse Weather Perception in Autonomous Driving

Sherif Khairy +1
cs.AI 2026-05-19 reviewed

Temporal conditioning changes AV planner style but not scores
From Prompts to Pavement Through Time: Temporal Grounding in Agentic Scene-to-Plan Reasoning

Ahmed Y. Gado +4
cs.CV 2026-05-19 reviewed

Landmark and language priors raise FER accuracy on three wild datasets
LaCoVL-FER: Landmark-Guided Contrastive Learning Network with Vision-Language Enhancement for Facial Expression Recognition

Jiaxin Wang +4
cs.CV 2026-05-19 reviewed

Stitched model lifts rewards to noisy latents for faster alignment
Stitched Value Model for Diffusion Alignment

Hyojun Go +10
cs.CV 2026-05-19 reviewed

Semi-supervised method reaches 79.99% Dice in fetal heart ultrasound
Synergistic Foundation Models for Semi-Supervised Fetal Cardiac Ultrasound Analysis: SAM-Med2D Boundary Refinement and DINOv3 Semantic Enhancement

Tonghao Zhuang (1) +7
cs.CV 2026-05-19 reviewed

Pose accuracy proxies depth quality without needing ground-truth depth
Depth2Pose: A Pose-Based Benchmark for Monocular Depth Estimation without Ground-Truth Depth

Viktor Kocur +6
cs.CV 2026-05-19 reviewed

VLMs localize objects with boundary tokens
Mechanisms of Object Localization in Vision-Language Models

Timothy Schauml\"offel +2
cs.LG 2026-05-19 reviewed

Class prototypes on the hypersphere reach neural collapse by design
Neural Collapse by Design: Learning Class Prototypes on the Hypersphere

Panagiotis Koromilas +3
cs.LG 2026-05-19 reviewed

Prototypes on the hypersphere reach neural collapse by design
Neural Collapse by Design: Learning Class Prototypes on the Hypersphere

Panagiotis Koromilas +3
cs.CV 2026-05-19 reviewed

Attention chains cut 4D mesh generation to 9 seconds
Fast 4D Mesh Generation by Spatio-Temporal Attention Chains

Dvir Samuel +3
cs.CV 2026-05-19 reviewed

Fused expert preferences and ratings lift VLM aesthetic SRCC to 0.709
Preferences Order, Ratings Anchor: From Fused Expert Aesthetic Ground Truth to Self-Distillation

Yuanpei Zhao +7
cs.CV 2026-05-19 reviewed

Self-distillation raises VLM aesthetic SRCC from 0.504 to 0.709
Preferences Order, Ratings Anchor: From Fused Expert Aesthetic Ground Truth to Self-Distillation

Yuanpei Zhao +7
cs.RO 2026-05-19 reviewed

Training on near-failure paths improves driving safety
Beyond Imitation: Learning Safe End-to-End Autonomous Driving from Hard Negatives

Junli Wang +9
cs.CV 2026-05-19 reviewed

Neuron selection lets VAR models add user concepts without forgetting prior ones
CPC-VAR:Continual Personalized and Compositional Generation in Visual Autoregressive Models

Junhao Li +6
cs.CV 2026-05-19 reviewed

One reference image flags traffic anomalies via embedding matches
Real-World On-Vehicle Evaluation of Embedding-Based Anomaly Detection

Albert Schotschneider +4
cs.CV 2026-05-19 reviewed

Reward optimization erases unwanted concepts in flow models
FlowErase-RL: Rethinking Concept Erasure as Reward Optimization in Flow Matching Models

Yi Sun +7
cs.GR 2026-05-19 reviewed

Browser renders MRI digital twins at 82 FPS on low-cost GPUs
Decentralized Direct Volume Rendering: A Browser-Native GPU Architecture for MRI Digital Twins in Resource-Constrained Settings

Oserebameh Augustine Beckley
cs.CV 2026-05-19 reviewed

Geometry injection enables unaligned optical-SAR retrieval
GeoMamba: A Geometry-driven MambaVision Framework and Dataset for Fine-grained Optical-SAR Object Retrieval

Tiantong Fang +5
cs.CV 2026-05-19 reviewed

Staged distillation keeps tiny diffusion models stable at 1.6 percent teacher size
LIFT and PLACE: A Simple, Stable, and Effective Knowledge Distillation Framework for Lightweight Diffusion Models

Hyunsoo Han +2
cs.CV 2026-05-19 reviewed

Tiny diffusion models reach FID 15.73 with staged distillation
LIFT and PLACE: A Simple, Stable, and Effective Knowledge Distillation Framework for Lightweight Diffusion Models

Hyunsoo Han +2
cs.CV 2026-05-19 reviewed

Frozen probe tunes video models to follow drone inertial commands
Aero-World: Action-Conditioned Aerial Video Generation from Inertial Controls

Abdul Mohaimen Al Radi +4
cs.CV 2026-05-19 reviewed

Tango3D aligns pixels to 3D points while preserving global retrieval
Tango3D: Towards Alignment for Global and Local 2D-3D Correspondence

Zebin He +6
cs.CV 2026-05-19 reviewed

Downsampled block selection speeds up diffusion attention nearly 7x
Efficient Long-Context Modeling in Diffusion Language Models via Block Approximate Sparse Attention

Wenhu Zhang +8
cs.CV 2026-05-19 reviewed

Physics-in-the-loop agents produce more complex valid CAD designs
Physics-in-the-Loop: A Hybrid Agentic Architecture for Validated CAD Engineering Design

Elias Berger +4
cs.CV 2026-05-19 reviewed

Sonar simulator matches real images at texture KL below 0.07
Physics-informed simulation framework for realistic sonar image generation and statistical validation

Kamal Basha S +1
cs.CV 2026-05-19 reviewed

CRP groups medical tasks from text for 73% Dice with 4% forgetting
MedCRP-CL: Continual Medical Image Segmentation via Bayesian Nonparametric Semantic Modality Discovery

Ziyuan Gao
cs.CV 2026-05-19 reviewed

New dataset labels 10k white blood cell images with 11 morphological traits
WBCAtt+: Fine-Grained Pixel-Level Morphological Annotations for White Blood Cell Images

Satoshi Tsutsui +3
cs.CV 2026-05-19 reviewed

Real JPEG tables cut false positives in document forgery detection
DocQT: Improving Document Forgery Localization Robustness via Diverse JPEG Quantization Tables

Kylian Ronfleux-Corail (L3I) +3
eess.IV 2026-05-19 reviewed

TADA adapts steganalysis to unknown JPEG pipelines
Tackle CSM in JPEG Steganalysis with Data Adaptation

Rony Abecidan (CRIStAL) +5
cs.CV 2026-05-19 reviewed

Satellite and ground photos fuse for better outdoor view synthesis
Cross-View Splatter: Feed-Forward View Synthesis with Georeferenced Images

Matias Turkulainen +8
cs.CV 2026-05-19 reviewed

NeRF augmentations train pose estimators from 25 real images
CAD-Free Learning of Spacecraft Pose Estimators via NeRF-Based Augmentations

Antoine Legrand +2
cs.CV 2026-05-19 reviewed

NeRF lets pose estimators train on 25-400 real images
CAD-Free Learning of Spacecraft Pose Estimators via NeRF-Based Augmentations

Antoine Legrand +2
cs.CV 2026-05-19 reviewed

Refiner teaches image models to fix their own mistakes
Benchmarking and Evolving Reason-Reflect-Rectify for Reflective Visual Generation

Junjie Wang +10
cs.CV 2026-05-19 reviewed

Panorama-first split lifts zero-shot navigation success 59 percent
P2DNav: Panorama-to-Downview Reasoning for Zero-shot Vision-and-Language Navigation

Kai Sheng +7
cs.RO 2026-05-19 reviewed

One model drives well across cities and sensors without retraining
HEAT: Heterogeneous End-to-End Autonomous Driving via Trajectory-Guided World Models

Hoonhee Cho +5
cs.CV 2026-05-19 reviewed

Component style transfer closes satellite sim-to-real gap
Component-Aware Structure-Preserving Style Transfer for Satellite Visual Sim2Real Data Construction

Zongwu Xie +4
cs.CV 2026-05-19 reviewed

Part-wise style transfer raises satellite pose accuracy
Component-Aware Structure-Preserving Style Transfer for Satellite Visual Sim2Real Data Construction

Zongwu Xie +4
cs.CV 2026-05-19 reviewed

Few-shot visual prototypes correct misclassifications in text-prompted segmentation
PrAda: Few-Shot Visual Adaptation for Text-Prompted Segmentation

Gabriele Rosi +3
cs.CV 2026-05-19 reviewed

Contrastive registers let ViTs drop spurious tokens and lift segmentation accuracy
UniRefiner: Teaching Pre-trained ViTs to Self-Dispose Dross via Contrastive Register

Congpei Qiu +5
cs.CV 2026-05-19 reviewed

Bézier curves stabilize LiDAR human motion capture
B\'ezier Degradation Modeling for LiDAR-based Human Motion Capture

Xiaoqi An +4
cs.CV 2026-05-19 reviewed

VLM feedback iterates to fix cross-camera color constancy
White-Balance First, Adjust Later: Cross-Camera Color Constancy via Vision-Language Evaluation

Shuwei Li +2
cs.CV 2026-05-19 reviewed

Physics-guided diffusion designs metasurface absorbers in 30 seconds
Physics Guided Conditional Diffusion Framework for Generative Inverse Design of Manufacturable Metasurface based Absorbers

Vineetha Joy +5
cs.CV 2026-05-19 reviewed

SVD-ordered paths yield less noisy model attributions
Spectral Integrated Gradients for Coarse-to-Fine Feature Attribution

Soyeon Kim +3
cs.CV 2026-05-19 reviewed

Datasets enable global tree mortality mapping from aerial imagery
deadtrees.earth-aerial: A Multi-Resolution Aerial Image Dataset for Tree Cover and Mortality Detection

Ayushi Sharma +11
cs.CV 2026-05-19 reviewed

YOLO26-MoE hits 0.99 mAP for spotting insulator faults in drone photos
A novel YOLO26-MoE optimized by an LLM agent for insulator fault detection considering UAV images

Jo\~ao Pedro Matos-Carvalho +4
cs.CV 2026-05-19 reviewed

Laminating film on lenses blocks identity while keeping action cues
Lens Privacy Sealing: A New Benchmark and Method for Physical Privacy-Preserving Action Recognition

Mengyuan Liu +3
cs.CV 2026-05-19 reviewed

Laminating film on lenses hides identities for action recognition
Lens Privacy Sealing: A New Benchmark and Method for Physical Privacy-Preserving Action Recognition

Mengyuan Liu +3
cs.CV 2026-05-19 reviewed

MLLMs often back correct answers with inconsistent egocentric evidence
EgoCoT-Bench: Benchmarking Grounded and Verifiable Operation-Centric Chain of Thought Reasoning for MLLMs

Yang Dai +3
cs.CV 2026-05-19 reviewed

Sparse diffusion cuts redundant matches for steadier camera tracking
EpiDiffVO: Geometry-Aware Epipolar Diffusion for Robust Visual Odometry

Prateeth Rao