archive
Every paper Pith has read. Search by title, abstract, or pith.
9568 papers in cs.CV · page 11
-
Real images align diffusion models as well as preference pairs
When Preference Labels Fall Short: Aligning Diffusion Models from Real Data
-
Dual-stream network lifts weather detection at full speed
CADENet: Condition-Adaptive Asynchronous Dual-Stream Enhancement Network for Adverse Weather Perception in Autonomous Driving
-
Temporal conditioning changes AV planner style but not scores
From Prompts to Pavement Through Time: Temporal Grounding in Agentic Scene-to-Plan Reasoning
-
Landmark and language priors raise FER accuracy on three wild datasets
LaCoVL-FER: Landmark-Guided Contrastive Learning Network with Vision-Language Enhancement for Facial Expression Recognition
-
Stitched model lifts rewards to noisy latents for faster alignment
Stitched Value Model for Diffusion Alignment
-
Semi-supervised method reaches 79.99% Dice in fetal heart ultrasound
Synergistic Foundation Models for Semi-Supervised Fetal Cardiac Ultrasound Analysis: SAM-Med2D Boundary Refinement and DINOv3 Semantic Enhancement
-
Pose accuracy proxies depth quality without needing ground-truth depth
Depth2Pose: A Pose-Based Benchmark for Monocular Depth Estimation without Ground-Truth Depth
-
VLMs localize objects with boundary tokens
Mechanisms of Object Localization in Vision-Language Models
-
Class prototypes on the hypersphere reach neural collapse by design
Neural Collapse by Design: Learning Class Prototypes on the Hypersphere
-
Prototypes on the hypersphere reach neural collapse by design
Neural Collapse by Design: Learning Class Prototypes on the Hypersphere
-
Attention chains cut 4D mesh generation to 9 seconds
Fast 4D Mesh Generation by Spatio-Temporal Attention Chains
-
Fused expert preferences and ratings lift VLM aesthetic SRCC to 0.709
Preferences Order, Ratings Anchor: From Fused Expert Aesthetic Ground Truth to Self-Distillation
-
Self-distillation raises VLM aesthetic SRCC from 0.504 to 0.709
Preferences Order, Ratings Anchor: From Fused Expert Aesthetic Ground Truth to Self-Distillation
-
Training on near-failure paths improves driving safety
Beyond Imitation: Learning Safe End-to-End Autonomous Driving from Hard Negatives
-
Neuron selection lets VAR models add user concepts without forgetting prior ones
CPC-VAR:Continual Personalized and Compositional Generation in Visual Autoregressive Models
-
One reference image flags traffic anomalies via embedding matches
Real-World On-Vehicle Evaluation of Embedding-Based Anomaly Detection
-
Reward optimization erases unwanted concepts in flow models
FlowErase-RL: Rethinking Concept Erasure as Reward Optimization in Flow Matching Models
-
Browser renders MRI digital twins at 82 FPS on low-cost GPUs
Decentralized Direct Volume Rendering: A Browser-Native GPU Architecture for MRI Digital Twins in Resource-Constrained Settings
-
Geometry injection enables unaligned optical-SAR retrieval
GeoMamba: A Geometry-driven MambaVision Framework and Dataset for Fine-grained Optical-SAR Object Retrieval
-
Staged distillation keeps tiny diffusion models stable at 1.6 percent teacher size
LIFT and PLACE: A Simple, Stable, and Effective Knowledge Distillation Framework for Lightweight Diffusion Models
-
Tiny diffusion models reach FID 15.73 with staged distillation
LIFT and PLACE: A Simple, Stable, and Effective Knowledge Distillation Framework for Lightweight Diffusion Models
-
Frozen probe tunes video models to follow drone inertial commands
Aero-World: Action-Conditioned Aerial Video Generation from Inertial Controls
-
Tango3D aligns pixels to 3D points while preserving global retrieval
Tango3D: Towards Alignment for Global and Local 2D-3D Correspondence
-
Downsampled block selection speeds up diffusion attention nearly 7x
Efficient Long-Context Modeling in Diffusion Language Models via Block Approximate Sparse Attention
-
Physics-in-the-loop agents produce more complex valid CAD designs
Physics-in-the-Loop: A Hybrid Agentic Architecture for Validated CAD Engineering Design
-
Sonar simulator matches real images at texture KL below 0.07
Physics-informed simulation framework for realistic sonar image generation and statistical validation
-
CRP groups medical tasks from text for 73% Dice with 4% forgetting
MedCRP-CL: Continual Medical Image Segmentation via Bayesian Nonparametric Semantic Modality Discovery
-
New dataset labels 10k white blood cell images with 11 morphological traits
WBCAtt+: Fine-Grained Pixel-Level Morphological Annotations for White Blood Cell Images
-
Real JPEG tables cut false positives in document forgery detection
DocQT: Improving Document Forgery Localization Robustness via Diverse JPEG Quantization Tables
-
TADA adapts steganalysis to unknown JPEG pipelines
Tackle CSM in JPEG Steganalysis with Data Adaptation
-
Satellite and ground photos fuse for better outdoor view synthesis
Cross-View Splatter: Feed-Forward View Synthesis with Georeferenced Images
-
NeRF augmentations train pose estimators from 25 real images
CAD-Free Learning of Spacecraft Pose Estimators via NeRF-Based Augmentations
-
NeRF lets pose estimators train on 25-400 real images
CAD-Free Learning of Spacecraft Pose Estimators via NeRF-Based Augmentations
-
Refiner teaches image models to fix their own mistakes
Benchmarking and Evolving Reason-Reflect-Rectify for Reflective Visual Generation
-
Panorama-first split lifts zero-shot navigation success 59 percent
P2DNav: Panorama-to-Downview Reasoning for Zero-shot Vision-and-Language Navigation
-
One model drives well across cities and sensors without retraining
HEAT: Heterogeneous End-to-End Autonomous Driving via Trajectory-Guided World Models
-
Component style transfer closes satellite sim-to-real gap
Component-Aware Structure-Preserving Style Transfer for Satellite Visual Sim2Real Data Construction
-
Part-wise style transfer raises satellite pose accuracy
Component-Aware Structure-Preserving Style Transfer for Satellite Visual Sim2Real Data Construction
-
Few-shot visual prototypes correct misclassifications in text-prompted segmentation
PrAda: Few-Shot Visual Adaptation for Text-Prompted Segmentation
-
Contrastive registers let ViTs drop spurious tokens and lift segmentation accuracy
UniRefiner: Teaching Pre-trained ViTs to Self-Dispose Dross via Contrastive Register
-
Bézier curves stabilize LiDAR human motion capture
B\'ezier Degradation Modeling for LiDAR-based Human Motion Capture
-
VLM feedback iterates to fix cross-camera color constancy
White-Balance First, Adjust Later: Cross-Camera Color Constancy via Vision-Language Evaluation
-
Physics-guided diffusion designs metasurface absorbers in 30 seconds
Physics Guided Conditional Diffusion Framework for Generative Inverse Design of Manufacturable Metasurface based Absorbers
-
SVD-ordered paths yield less noisy model attributions
Spectral Integrated Gradients for Coarse-to-Fine Feature Attribution
-
Datasets enable global tree mortality mapping from aerial imagery
deadtrees.earth-aerial: A Multi-Resolution Aerial Image Dataset for Tree Cover and Mortality Detection
-
YOLO26-MoE hits 0.99 mAP for spotting insulator faults in drone photos
A novel YOLO26-MoE optimized by an LLM agent for insulator fault detection considering UAV images
-
Laminating film on lenses blocks identity while keeping action cues
Lens Privacy Sealing: A New Benchmark and Method for Physical Privacy-Preserving Action Recognition
-
Laminating film on lenses hides identities for action recognition
Lens Privacy Sealing: A New Benchmark and Method for Physical Privacy-Preserving Action Recognition
-
MLLMs often back correct answers with inconsistent egocentric evidence
EgoCoT-Bench: Benchmarking Grounded and Verifiable Operation-Centric Chain of Thought Reasoning for MLLMs
-
Sparse diffusion cuts redundant matches for steadier camera tracking
EpiDiffVO: Geometry-Aware Epipolar Diffusion for Robust Visual Odometry