archive
Every paper Pith has read. Search by title, abstract, or pith.
9568 papers in cs.CV · page 4
-
Synthetic RAW data yields same low-light detection metrics as real
Making the Discrete Continuous: Synthetic RAW Augmentations for Fine-Grained Evaluation of Person Detection Performance in Low Light
-
Pre-VLA lifts VLA success rates from 31% to 38%
Pre-VLA: Preemptive Runtime Verification for Reliable Vision-Language-Action and World-Model Rollouts
-
Block-sparse model separates rPPG signals from video noise
Time-varying rPPG signal separation via block-sparse signal model
-
Dual-shutter pairs invert motion blur and distortion
Moment-Reenacting: Inverse Motion Degradation with Cross-shutter Guidance
-
Paired blur and distortion images recover high-speed motion
Moment-Reenacting: Inverse Motion Degradation with Cross-shutter Guidance
-
Table structure recovered by predicting grid counts and separators directly
FastTab: A Fast Table Recognizer with a Tiny Recursive Module and 1D Transformers
-
GenRe generalizes 3D urban scenes to new viewpoints in minutes
Diffusion-guided Generalizable Enhancer for Urban Scene Reconstruction
-
Explicit baseline fixes attribution errors in neural explanations
The Neglected Baseline in Model Interpretation
-
New benchmark and GRPO method lift MLLMs past proprietary models on receipt reasoning
From Recognition to Reasoning: Benchmarking and Enhancing MLLMs on Real-World Receipt Document Understanding
-
Lesion grounding lifts ophthalmic VQA accuracy and clarity
Towards Clinically Interpretable Ophthalmic VQA via Spatially-Grounded Lesion Evidence
-
LLMs recognize activities from muscle signals after language mapping
Translating Signals to Languages for sEMG-Based Activity Recognition
-
Benchmark shows AI unreliable on agricultural tool tasks
AgroTools: A Benchmark for Tool-Augmented Multimodal Agents in Agriculture
-
3D eye prior synthesizes training data for any new AR/VR tracker
GazePrior: Zero-Shot AR/VR Eye Tracking via Learned 3D Gaze Reconstruction
-
Multiple metrics needed to judge liver vessel segmentation
VEELA: A Clinically-Constrained Benchmark for Liver Vessel Segmentation in Computed Tomography Angiography
-
QuantSR+ raises 2-bit SR accuracy by 0.29 dB while cutting ops 87.9%
QuantSR+: Pushing the Limit of Quantized Image Super-Resolution Networks
-
MLLM planner in ViT space guides DiT to SOTA video generation and edits
Bernini: Latent Semantic Planning for Video Diffusion
-
Watermarking 4D splats by gating at motion-curvature instants
4D-GSW: Kinematic-Aware Spatio-Temporal Consistent Watermarking for 4D Gaussian Splatting
-
Multispectral LiDAR lifts 3D land cover mIoU by up to 7.8 points
3D LULC classification using multispectral LiDAR and deep learning: current and prospective schemes
-
K-space hybrid model holds up better for breast lesion segmentation under acceleration
Robustness of breast lesion segmentation under MRI undersampling improves with k-space-aware deep learning
-
Anchor swaps erase specific identities from face generators
PIU: Proximity-guided Identity Unlearning in ID-Conditioned Diffusion Models
-
DEVO exports sparse point clouds matching EMVS at 5 cm
Extending Deep Event Visual Odometry with Sparse Point-Cloud Export
-
YOLOv2 with FPN and switchable convolution hits 68% mAP on virus patches
Detection of Virus and Small Cell Patches in Foci Images Using Switchable Convolution and Feature Pyramid Networks
-
Curved fractal patches fool VIS-IR VLMs
Exposing Vulnerabilities in Visible-Infrared VLMs: A Unified Geometric Adversarial Framework with Cross-Task Transferability
-
4D trajectories and sparse tracking enable zero-shot robot-object tasks
Imagine2Real: Towards Zero-shot Humanoid-Object Interaction via Video Generative Priors
-
Sparse keypoints in behavior model enable zero-shot humanoid interactions
Imagine2Real: Towards Zero-shot Humanoid-Object Interaction via Video Generative Priors
-
Multi-grained compression lifts long video QA accuracy
MuKV: Multi-Grained KV Cache Compression for Long Streaming Video Question-Answering
-
YOLOv8 recall falls below 40% under strong turbulence in satellite images
Impact of Atmospheric Turbulence and Pointing Error on Earth Observation
-
Evidence hierarchy lifts Bayesian threat classification to 95%
An Evidence Hierarchy for Bayesian Object Classification via OSINT-Aided Heterogeneous Sensor Fusion
-
OMR tops matched music score search
Direct content-based retrieval from music scores images
-
Graphs plus diffusion improve tumor segmentation with missing MRI scans
D3Seg: Dependency-Aware Diffusion for Brain Tumor Segmentation with Missing Modalities
-
Model recovers 3D hand poses from distant room corners
REACH: Hand Pose Estimation from Room Corners
-
Semi-supervised UniMatch V2 segments weather-degraded images
A Robust Semantic Segmentation Pipeline for the CVPR 2026 8th UG2+ Challenge Track 2
-
Semi-supervised training lifts segmentation in bad weather
A Robust Semantic Segmentation Pipeline for the CVPR 2026 8th UG2+ Challenge Track 2
-
Anatomy residual pathway lifts VCE mAP to 0.3409
GALAR-TemporalNet v2: Anatomy-Guided Dual-Branch Temporal Classification with Bidirectional Mamba and Dual-Graph GCN for Video Capsule Endoscopy -- after competition results
-
Self-evolving pool optimizes image restoration agent
EvoIR-Agent: Self-Evolving Image Restoration Agentic System via Experience-Driven Learning
-
Text from LLMs guides zero-shot action localization in videos
Zero-Shot Temporal Action Localization Through Textual Guidance
-
Video models top open suturing skill challenge
OSS: Open Suturing Skills Vision-Based Assessment Challenge 2024-2025
-
Graph of patches cuts UHD quality prediction error
Ultra-High-Definition Image Quality Assessment via Graph Representation Learning
-
Feed-forward model reconstructs 4D scenes without camera poses
No Pose, No Problem in 4D: Feed-Forward Dynamic Gaussians from Unposed Multi-View Videos
-
Events and illumination collaborate to fix low-light photos
Event-Illumination Collaborative Low-light Image Enhancement with a High-resolution Real-world Dataset
-
Telematics and CV fusion boosts MLLM safety event detection
Enhancing Multimodal Large Language Models for Safety-Critical Driving Video Analysis
-
Hybrid sampling beats pure uncertainty or diversity in active learning
Balancing Uncertainty and Diversity of Samples: Leveraging Diversity of Least, High Confidence Samples for Effective Active Learning
-
Dual selection prunes video tokens while keeping static scenes and changes
ST-SimDiff: Balancing Spatiotemporal Similarity and Difference for Efficient Video Understanding with MLLMs
-
FlowGS speeds up continuous-scale super-resolution for remote sensing
Flow-based Gaussian Splatting for Continuous-Scale Remote Sensing Image Super-Resolution
-
One sentence becomes a full short drama with AI agents
One Sentence, One Drama: Personalized Short-Form Drama Generation via Multi-Agent Systems
-
Event cameras match RGB gait ID in light
EventGait: Towards Robust Gait Recognition with Event Streams
-
Swapping ViT attention heads for depthwise convolutions speeds inference 17-20%
Accelerating Vision Foundation Models with Drop-in Depthwise Convolution
-
Two-stage AI plans then executes fixes for photo flaws
AesFormer: Transform Everyday Photos into Beautiful Memories
-
Diffusion models correct motion in 3D brain MRI
MotionDPS: Motion-Compensated 3D Brain MRI Reconstruction
-
MLLMs get personality scores right but ignore video cues half the time
Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?