archive
Every paper Pith has read. Search by title, abstract, or pith.
9568 papers in cs.CV · page 2
-
Multi-view probes read model weights more accurately
What Linear Probes Miss: Multi-View Probing for Weight-Space Learning
-
3D CNNs spot and name hand gestures in live video
Online Hand Gesture Recognition Using 3D Convolutional Neural Networks
-
Roadside LiDAR generates vehicle data to improve detection
RS2AD-LiDAR: End-to-End Autonomous Driving LiDAR Data Generation from Roadside Sensor Observations
-
Deep correspondences jointly calibrate camera intrinsics and LiDAR extrinsics
Joint Target-Less Intrinsic and Extrinsic Camera-LiDAR Calibration using Deep Point Correspondences
-
Velocity split accelerates flow models without training
VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation
-
Per-pixel module confines FPS weapon actions to local scope
SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models
-
Uncertainty gate activates contrastive decoding only on risky tokens
CHASD: Language Increment-Calibrated Contrastive Decoding against Hallucination in LVLMs
-
Geometry measure added to lane filtering keeps accurate lines
GFSR: Geometric Fidelity and Spatial Refinement for Reliable Lane Detection
-
Hybrid quantum models raise blood cell F1 scores by up to 3.7%
Enhancing Blood Cells Classification using Hybrid Quantum Neural Networks
-
EF-LIC skips entropy coding yet matches its performance
Efficient Learned Image Compression without Entropy Coding
-
Language rules from regulations replace image labels for hazard detection
General Hazard Detection
-
Dense 4D volumes preserve local cues for video action recognition
Spatio-Temporal Similarity Volume Aggregation for Open-Vocabulary Action Recognition
-
Feed-forward model creates language-labeled 3D scenes from sparse photos
LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images
-
Neural operator deblurs varying blur in pathology slides
Discontinuous Galerkin Neural Operator for Pathology Defocus Deblurring
-
Vision-language agent picks depth experts per sample
DepthAgent: Towards Better Universal Depth Estimation via Sample-wise Expert Selection
-
Unified engine retrieves events from large video sets consistently
U-CESE: Unified Clip-based Event Search Engine for AI Challenge HCMC 2025
-
EvalVerse calibrates VLMs to expert cinematic video standards
EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation
-
Hybrid planner reaches 94.85 on NAVSIM
ChainFlow-VLA: Causal Flow Planning with Vision-Language Models
-
Coloring noise in Sobolev space fixes SR spectral mismatch
Coloring the Noise: Adversarial Sobolev Alignment for Faithful Image Super Resolution
-
Convex hull of historical prompts bridges new VLN domains
Turning Adaptation into Assets: Cross-Domain Bridging for Online Vision-Language Navigation
-
Consensus method improves noisy label correction for rare classes
CARE: Class-Adaptive Expert Consensus for Reliable Learning with Long-Tailed Noisy Labels
-
Single-frame edit extends across video via diffusion priors
SimInsert: Seamless Video Object Insertion via Regional Sparse Attention Fusion
-
Benchmark supplies multi-baseline stereo pairs with full calibration
StereoGenBench: A Synthetic Multi-Camera Benchmark for Stereo Generation under Controlled Baseline Regimes
-
IDEAL detects anomalies from both normal and anomalous few-shot examples
Beyond Normal References: Discriminative Few-Shot Anomaly Detection
-
Benchmark shows VLMs fail at tracing causal chains in video
CaST-Bench: Benchmarking Causal Chain-Grounded Spatio-Temporal Reasoning for Video Question Answering
-
Homography mapping yields linear bounds for camera motion verification
Lipschitz Optimization for Formal Verification of Homographies
-
Physics-semantic keyframe scoring fixes occluded video editing
Occlusion-Aware Physics-Semantic Keyframe Selection for Robust Video Editing
-
VLMs reach only 5.5% success on implicit intent navigation
IntentionNav: A Benchmark for Intent-Driven Object Navigation from Implicit Human Instruction
-
GMENet generates missing MRI to expand usable glioma data by 97%
GMENet: Generative Mixture of Experts Network for Multi-Center Glioma Diagnosis with Incomplete Imaging Sequences
-
Joint pose and image prediction improves multi-person scene accuracy
Composing People Together: Iterative Pose-Image Generation for Multi-Person Interaction Scenes
-
VLMs trail humans by 28.4 points on driving scenes benchmark
DRIVESPATIAL: A Benchmark for Spatiotemporal Intelligence in VLMs for Autonomous Driving
-
Quantized labels cut rPPG model size 88% and raise speed 191%
LQ-rPPG: A Label-Quantized Coarse-to-Fine Learning Framework for Remote Physiological Measurement
-
Semantic cues speed drone exploration 13.7 times on average
Semantic-Aware Guided Drone Exploration for Language-Conditioned 3D Indoor Mapping
-
Attributes replace category lists for remote sensing pre-training
SLIP-RS: Structured-Attribute Language-Image Pre-Training for Remote Sensing Object Detection
-
VLMs fail to infer visual relations from examples
VisAnalog: A Diagnostic Suite for Visual Concept Transfer on Natural Images
-
EEG model reaches 34.5% top-1 accuracy in 200-way image retrieval
STAMBRIDGE: Spectral-Temporal Amplitude-aware Mid-Feature Bridge for EEG Visual Decoding
-
Verified prompts plus longitudinal context raise lesion tracking Dice by 4.5 points
Exploiting Longitudinal Context in Clinician-Verified Interactive Lesion Tracking
-
One frozen VLM detects video anomalies without training
CoReVAD: A Contextual Reasoning Framework for Training-Free Video Anomaly Detection
-
Schrödinger Bridge raises deepfake AP@0.95 by 3-10%
Inconsistency-aware Multimodal Schr\"odinger Bridge for Deepfake Localization
-
Synthetic MRIs raise accuracy for one tumour classifier by 1.02%
Do Synthetic Brain MRIs Reliably Improve Tumour Classification? A StyleGAN2-ADA Class-Plane Augmentation Study on BRISC 2025
-
Velocity mismatches flag anomalies in flow matching models
Flow Mismatching: Unsupervised Anomaly Detection via Velocity Discrepancies in Flow Matching Models
-
This paper introduces RoboSurg-VQA
RoboSurg-VQA: A Multimodal Benchmark for Surgical Segmentation-Aware Visual Question Answering
-
Dithering defends vision models against adversarial attacks
Dithering Defense: Adversarial Robustness of Vision Foundation Models via Multi-Level Floyd-Steinberg Dithering
-
Vertex weights let mmWave data drive accurate SMPL body fits
Millimeter-wave Imaging for Anthropometric Body Measurement
-
Motion data alone rivals video models trained on 10000x more examples
The TIME Machine: On The Power of Motion for Efficient Perception
-
RADAR forecasts transfer by comparing representation trajectories
RADAR: Relative Angular Divergence Across Representations
-
Reconstructed maps raise 3D detection scores without manual HD maps
Scene Reconstruction as Mapping Priors for 3D Detection
-
Binary masks control precise motion in generated videos
CoMoGen: COntrollable MOtion Dynamics and Interactions with Mask-Guided Video GENeration
-
Toolkit automates annotation of child-caregiver eye-tracking videos
GazeBehavior Annotation Toolkit (GBAT): AI-powered toolkit for automatic annotation of egocentric eye-tracking and video data of child-caregiver interaction
-
Pixel prior from QueryMLP lifts buoy association to 0.7386
Improved Vision-to-Chart Buoy Association with Learned World-to-Image Projection