archive

Every paper Pith has read. Search by title, abstract, or pith.

9568 papers in cs.CV · page 2

cs.LG 2026-05-22 reviewed

Multi-view probes read model weights more accurately
What Linear Probes Miss: Multi-View Probing for Weight-Space Learning

Eunwoo Heo +2
cs.CV 2026-05-22 reviewed

3D CNNs spot and name hand gestures in live video
Online Hand Gesture Recognition Using 3D Convolutional Neural Networks

Yinghao Qin +1
cs.CV 2026-05-22 reviewed

Roadside LiDAR generates vehicle data to improve detection
RS2AD-LiDAR: End-to-End Autonomous Driving LiDAR Data Generation from Roadside Sensor Observations

Runyi Huang +5
cs.CV 2026-05-22 reviewed

Deep correspondences jointly calibrate camera intrinsics and LiDAR extrinsics
Joint Target-Less Intrinsic and Extrinsic Camera-LiDAR Calibration using Deep Point Correspondences

Simon Bultmann +2
cs.CV 2026-05-22 reviewed

Velocity split accelerates flow models without training
VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation

Junwen Tan +3
cs.CV 2026-05-22 reviewed

Per-pixel module confines FPS weapon actions to local scope
SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models

Zizhao Tong +13
cs.CV 2026-05-22 reviewed

Uncertainty gate activates contrastive decoding only on risky tokens
CHASD: Language Increment-Calibrated Contrastive Decoding against Hallucination in LVLMs

Xiaoyi Huang +2
cs.CV 2026-05-22 reviewed

Geometry measure added to lane filtering keeps accurate lines
GFSR: Geometric Fidelity and Spatial Refinement for Reliable Lane Detection

Tiancheng Wang +7
cs.CV 2026-05-22 reviewed

Hybrid quantum models raise blood cell F1 scores by up to 3.7%
Enhancing Blood Cells Classification using Hybrid Quantum Neural Networks

Guilherme Cruz +4
eess.IV 2026-05-22 reviewed

EF-LIC skips entropy coding yet matches its performance
Efficient Learned Image Compression without Entropy Coding

Hao Cao +3
cs.CV 2026-05-22 reviewed

Language rules from regulations replace image labels for hazard detection
General Hazard Detection

Stephanie Ng +7
cs.CV 2026-05-22 reviewed

Dense 4D volumes preserve local cues for video action recognition
Spatio-Temporal Similarity Volume Aggregation for Open-Vocabulary Action Recognition

Yerim So +3
cs.CV 2026-05-22 reviewed

Feed-forward model creates language-labeled 3D scenes from sparse photos
LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images

Yilong Liu +3
eess.IV 2026-05-22 reviewed

Neural operator deblurs varying blur in pathology slides
Discontinuous Galerkin Neural Operator for Pathology Defocus Deblurring

Shaoqing Duan +4
cs.CV 2026-05-22 reviewed

Vision-language agent picks depth experts per sample
DepthAgent: Towards Better Universal Depth Estimation via Sample-wise Expert Selection

Jie Zhu +2
cs.CV 2026-05-22 reviewed

Unified engine retrieves events from large video sets consistently
U-CESE: Unified Clip-based Event Search Engine for AI Challenge HCMC 2025

Duc-Nhuan Le +4
cs.CV 2026-05-22 reviewed

EvalVerse calibrates VLMs to expert cinematic video standards
EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation

Songlin Yang +25
cs.CV 2026-05-22 reviewed

Hybrid planner reaches 94.85 on NAVSIM
ChainFlow-VLA: Causal Flow Planning with Vision-Language Models

Xiyang Wang +9
cs.CV 2026-05-22 reviewed

Coloring noise in Sobolev space fixes SR spectral mismatch
Coloring the Noise: Adversarial Sobolev Alignment for Faithful Image Super Resolution

Hongbo Wang +5
cs.RO 2026-05-22 reviewed

Convex hull of historical prompts bridges new VLN domains
Turning Adaptation into Assets: Cross-Domain Bridging for Online Vision-Language Navigation

Zixuan Hu +5
cs.CV 2026-05-22 reviewed

Consensus method improves noisy label correction for rare classes
CARE: Class-Adaptive Expert Consensus for Reliable Learning with Long-Tailed Noisy Labels

Mengke Li +5
cs.CV 2026-05-22 reviewed

Single-frame edit extends across video via diffusion priors
SimInsert: Seamless Video Object Insertion via Regional Sparse Attention Fusion

Xinyu Chen +11
cs.CV 2026-05-22 reviewed

Benchmark supplies multi-baseline stereo pairs with full calibration
StereoGenBench: A Synthetic Multi-Camera Benchmark for Stereo Generation under Controlled Baseline Regimes

Yangzhi Cui +2
cs.CV 2026-05-22 reviewed

IDEAL detects anomalies from both normal and anomalous few-shot examples
Beyond Normal References: Discriminative Few-Shot Anomaly Detection

Huan Wang +3
cs.CV 2026-05-22 reviewed

Benchmark shows VLMs fail at tracing causal chains in video
CaST-Bench: Benchmarking Causal Chain-Grounded Spatio-Temporal Reasoning for Video Question Answering

Mingfang Zhang +9
cs.CV 2026-05-22 reviewed

Homography mapping yields linear bounds for camera motion verification
Lipschitz Optimization for Formal Verification of Homographies

Jean-Guillaume Durand +3
cs.CV 2026-05-22 reviewed

Physics-semantic keyframe scoring fixes occluded video editing
Occlusion-Aware Physics-Semantic Keyframe Selection for Robust Video Editing

Lin Liu +6
cs.CV 2026-05-22 reviewed

VLMs reach only 5.5% success on implicit intent navigation
IntentionNav: A Benchmark for Intent-Driven Object Navigation from Implicit Human Instruction

Lin Qian +6
eess.IV 2026-05-22 reviewed

GMENet generates missing MRI to expand usable glioma data by 97%
GMENet: Generative Mixture of Experts Network for Multi-Center Glioma Diagnosis with Incomplete Imaging Sequences

Pengfei Song +7
cs.CV 2026-05-22 reviewed

Joint pose and image prediction improves multi-person scene accuracy
Composing People Together: Iterative Pose-Image Generation for Multi-Person Interaction Scenes

Wenxuan Peng +2
cs.CV 2026-05-22 reviewed

VLMs trail humans by 28.4 points on driving scenes benchmark
DRIVESPATIAL: A Benchmark for Spatiotemporal Intelligence in VLMs for Autonomous Driving

Hao Vo +12
cs.CV 2026-05-22 reviewed

Quantized labels cut rPPG model size 88% and raise speed 191%
LQ-rPPG: A Label-Quantized Coarse-to-Fine Learning Framework for Remote Physiological Measurement

Jun Seong Lee +3
cs.RO 2026-05-22 reviewed

Semantic cues speed drone exploration 13.7 times on average
Semantic-Aware Guided Drone Exploration for Language-Conditioned 3D Indoor Mapping

Nitin Vegesna +1
cs.CV 2026-05-22 reviewed

Attributes replace category lists for remote sensing pre-training
SLIP-RS: Structured-Attribute Language-Image Pre-Training for Remote Sensing Object Detection

Chenxu Wang +5
cs.CV 2026-05-22 reviewed

VLMs fail to infer visual relations from examples
VisAnalog: A Diagnostic Suite for Visual Concept Transfer on Natural Images

Zhaonan Li +15
eess.IV 2026-05-22 reviewed

EEG model reaches 34.5% top-1 accuracy in 200-way image retrieval
STAMBRIDGE: Spectral-Temporal Amplitude-aware Mid-Feature Bridge for EEG Visual Decoding

Jiahe Meng +7
cs.CV 2026-05-22 reviewed

Verified prompts plus longitudinal context raise lesion tracking Dice by 4.5 points
Exploiting Longitudinal Context in Clinician-Verified Interactive Lesion Tracking

Yannick Kirchhoff +7
cs.CV 2026-05-22 reviewed

One frozen VLM detects video anomalies without training
CoReVAD: A Contextual Reasoning Framework for Training-Free Video Anomaly Detection

Hyeongmuk Lim +1
cs.CV 2026-05-22 reviewed

Schrödinger Bridge raises deepfake AP@0.95 by 3-10%
Inconsistency-aware Multimodal Schr\"odinger Bridge for Deepfake Localization

Jiayu Xiong +4
eess.IV 2026-05-21 reviewed

Synthetic MRIs raise accuracy for one tumour classifier by 1.02%
Do Synthetic Brain MRIs Reliably Improve Tumour Classification? A StyleGAN2-ADA Class-Plane Augmentation Study on BRISC 2025

Jos\'e Rafael Noriega Cede\~no
cs.CV 2026-05-21 reviewed

Velocity mismatches flag anomalies in flow matching models
Flow Mismatching: Unsupervised Anomaly Detection via Velocity Discrepancies in Flow Matching Models

Shengzhe Chen +3
cs.CV 2026-05-21 reviewed

This paper introduces RoboSurg-VQA
RoboSurg-VQA: A Multimodal Benchmark for Surgical Segmentation-Aware Visual Question Answering

Chengyi Zhang +2
cs.CV 2026-05-21 reviewed

Dithering defends vision models against adversarial attacks
Dithering Defense: Adversarial Robustness of Vision Foundation Models via Multi-Level Floyd-Steinberg Dithering

Yury Belousov +3
cs.CV 2026-05-21 reviewed

Vertex weights let mmWave data drive accurate SMPL body fits
Millimeter-wave Imaging for Anthropometric Body Measurement

Miriam Senne +4
cs.CV 2026-05-21 reviewed

Motion data alone rivals video models trained on 10000x more examples
The TIME Machine: On The Power of Motion for Efficient Perception

Mantas Skackauskas +2
cs.LG 2026-05-21 reviewed

RADAR forecasts transfer by comparing representation trajectories
RADAR: Relative Angular Divergence Across Representations

Xavier Cadet +2
cs.CV 2026-05-21 reviewed

Reconstructed maps raise 3D detection scores without manual HD maps
Scene Reconstruction as Mapping Priors for 3D Detection

Yang Fu +10
cs.CV 2026-05-21 reviewed

Binary masks control precise motion in generated videos
CoMoGen: COntrollable MOtion Dynamics and Interactions with Mask-Guided Video GENeration

Adil Meric +5
cs.CV 2026-05-21 reviewed

Toolkit automates annotation of child-caregiver eye-tracking videos
GazeBehavior Annotation Toolkit (GBAT): AI-powered toolkit for automatic annotation of egocentric eye-tracking and video data of child-caregiver interaction

Iba Baig +7
cs.CV 2026-05-21 reviewed

Pixel prior from QueryMLP lifts buoy association to 0.7386
Improved Vision-to-Chart Buoy Association with Learned World-to-Image Projection

Borja Carrillo-Perez (Arquimea Research Center)