archive

Every paper Pith has read. Search by title, abstract, or pith.

9568 papers in cs.CV · page 1

cs.CV 2026-05-22 reviewed

Geometric reward aligns camera paths in generated videos
Geo-Align: Video Generation Alignment via Metric Geometry Reward

Zizun Li +4
cs.CV 2026-05-22 reviewed

Pixel diffusion turns 512x512 latents into 2048x2048 images in 210 ms
PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

Yifan Lu +6
cs.CV 2026-05-22 reviewed

Dedicated image editor lifts multimodal reasoning by 5 points
ETCHR: Editing To Clarify and Harness Reasoning

Beichen Zhang +5
cs.CV 2026-05-22 reviewed

Causal tests show many brain localizations are false positives
From Activation to Causality: Discovery of Causal Visual Representations in the Human Brain

Yuval Golbari +7
cs.CV 2026-05-22 reviewed

Token selection speeds geometry transformers over 85 percent
Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers

Shuhong Zheng +5
cs.CV 2026-05-22 reviewed

Dual-stream system inserts objects into videos harmoniously
Smart-Insertion-V: Photorealistic Video Insertion via a Closed-Loop Feedback Dual-Stream Framework

Xiao Cao +9
cs.CV 2026-05-22 reviewed

HorizonStream keeps 3D reconstruction stable past 10,000 frames
HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction

Chong Cheng +11
cs.CV 2026-05-22 reviewed

Projection conditioning lifts generative priors to scene reconstruction
GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction

Katharina Schmid +4
cs.CV 2026-05-22 reviewed

Geometric overlays on images lift MLLM spatial scores by 20%
PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs

Rim Assouel +3
cs.CV 2026-05-22 reviewed

Self-supervised priors raise physical fidelity in video generators
LaMo: Self-Supervised Latent Motion Priors for Physical Realism in Video Generation

Bo Jiang +5
cs.CV 2026-05-22 reviewed

Entmax attention lifts ViT segmentation mIoU by up to 6 points
Vision Transformers Need Better Token Interaction

Linxiang Su
cs.LG 2026-05-22 reviewed

Foundation models support zero-shot causal image reasoning
Leveraging Foundation Models for Causal Generative Modeling

Aneesh Komanduri +1
cs.CV 2026-05-22 reviewed

Dynamics model learns particle motion from real videos alone
Learning a Particle Dynamics Model with Real-world Videos

Chanho Kim +2
cs.CV 2026-05-22 reviewed

Pretraining on decomposition maps cuts labeled data needs for Mueller polarimetry
MuellerPT: Decomposition Driven Pretraining for Dense Learning in Mueller Polarimetry

Adam Tlemsani +6
cs.CV 2026-05-22 reviewed

LLM splits video queries into tool calls merged by boolean logic
Decomposing Queries into Tool Calls for Long-Video Keyframe Retrieval

Michal Shlapentokh-Rothman +3
cs.CV 2026-05-22 reviewed

Vision models match humans best at balanced generative-discriminative mix
Not Too Generative, Not Too Discriminative: The Human Alignment Sweet Spot

Jorge Chang Ortega +3
cs.LG 2026-05-22 reviewed

Debiased mining converts OOD detection to Monte-Carlo sampling
Debiased Negative Mining Improves Out-of-distribution Detection with Pre-trained Vision-Language Models

Bo Peng +3
cs.CV 2026-05-22 reviewed

Transformer predicts saliency from event camera streams
Exploring deep learning for Event-Based Saliency Prediction with a Transformer-based model

Romaric Mazna +2

4 Piths
cs.CV 2026-05-22 reviewed

ML framework grades emeralds at 98 percent accuracy
Machine learning applied to emerald gemstone grading: framework proposal and creation of a public dataset

FB Pena +4
cs.CV 2026-05-22 reviewed

cGAN counts eucalyptus logs at 92.3 percent accuracy
A Novel Approach for the Counting of Wood Logs Using cGANs and Image Processing Techniques

Jo\~ao VC Mazzochin +6
cs.CV 2026-05-22 reviewed

Agent beats baselines at text-guided 3D photo search
PhotoFlow: Agentic 3D Virtual Photography Missions

Jiarui Guo +7
cs.CV 2026-05-22 reviewed

Stabilized SegFormer reaches 0.4572 mIoU on original DMS split
Revitalizing Dense Material Segmentation: Stabilized Vision Transformers and the Generalization Paradox

Allan Kazakov +3
cs.CV 2026-05-22 reviewed

Video models fail physics consistency under viewpoint shifts
CRONOS: Benchmarking Counterfactual Physical Consistency in Video Models

Le\'on Begiristain +2
cs.CV 2026-05-22 reviewed

RiGS models multi-scale motions with three Gaussian types
RiGS: Rigid-aware 4D Gaussian Splatting from a Single Monocular Video

Chenyu Wu +3
cs.CV 2026-05-22 reviewed

Coupling narrow models cuts 30% FLOPs from wide vision training
Recursive Block-Diagonal Coupling for Resource-Efficient Training of Vision Models

Maxim Henry +3
cs.CV 2026-05-22 reviewed

Adaptive search fixes blind spots in high-res image perception for LLMs
CVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception

Liupeng Li +6
cs.CV 2026-05-22 reviewed

3D hand motions predict open-surgery skill with r=0.78
ExpOS: Explainable Open-Surgery Skills Assessment Using 3D Hand Reconstruction

Roi Papo +2
cs.CV 2026-05-22 reviewed

Final diagnosis scores hide flawed medical workups in AI
DDX-TRACE: A Benchmark for Medical Diagnostic Trajectories in VLMs

Jiazhen Pan +9
cs.CV 2026-05-22 reviewed

Entity patches in memory fix consistency in multi-shot videos
EM-Vid: Training-Free Entity-Centric Memory for Efficient and Consistent Multi-Shot Video Generation

Jente Vandersanden +4
cs.CV 2026-05-22 reviewed

Semantic banks let 3D splatting handle night glow scenes
GlowGS: Generative Semantic Feature Learning for 3D Gaussian Splatting in Nighttime Glow Scenes

Beibei Lin +3
cs.LG 2026-05-22 reviewed

Meta-learning yields model performance scores on unlabeled data
Learning to Evaluate: Cost-Effective Model Evaluation on Unlabeled Data with Meta-Learning

Trinh Pham +4
cs.CV 2026-05-22 reviewed

Support map shows some regions supply stronger LiDAR-camera cues
Calibration-Informative Region Selection for Online LiDAR--Camera Calibration in Agricultural Environments

Rajitha de Silva +1
cs.CV 2026-05-22 reviewed

PathNavigate scans slides for surprises before matching the question
PathNavigate: A Training-Free Pathology Agent with Surprise-Guided Scan and Shared Slide Memory for Whole-Slide Image VQA

Chunze Yang +12
cs.CV 2026-05-22 reviewed

Tri-module augmentation lifts 3D avatar quality from short videos
Generator-Refiner-Examiner: A Tri-Module Data Augmentation Framework for 3D Human Avatar Learning from Monocular Videos

Gangjian Zhang +5
cs.CV 2026-05-22 reviewed

PixIE raises low-light PSNR by up to 15% using DINO prompts
PixIE: Prompted Pixel-Space Low-Light Image Enhancement

Ruirui Lin +3
cs.CV 2026-05-22 reviewed

Hand motions guide stable object tracking in RGB video
ComPose: When to Trust Hands for Object Pose Tracking

Jisu Shin +7
cs.LG 2026-05-22 reviewed

New sampler cuts RL training time for flow models by up to 53%
Precise: SDE-Consistent Stochastic Sampling for RL Post-Training of Flow-Matching Models

Jade Zou +9
cs.CV 2026-05-22 reviewed

120K triplets enable instruction editing at 4K+ resolution
VINS-120K: Ultra High-Resolution Image Editing with A Large-Scale Dataset

Zhizhou Chen +8
cs.GR 2026-05-22 reviewed

Sketches control long video generation via independent shots
DrawVideo: Generating Long Video from Storyboard Keyframe Sketches

Chuanzhi Xu +9
cs.CV 2026-05-22 reviewed

MDS-DETR gains +2.8 mAP over Deformable-DETR with 5% extra training
MDS-DETR: DETR with Masked Duplicate Suppressor

Chanho Lee +3
cs.CV 2026-05-22 reviewed

Bootstrapped GRTO unifies RL and tool training for segmentation
B-GRTO: Bootstrapped Group Relative Tool Optimization for Referring Segmentation

Mario Markov +5
cs.CV 2026-05-22 reviewed

MDM distills vision-language datasets into compact synthetic sets
Multimodal Distribution Matching for Vision-Language Dataset Distillation

Jongoh Jeong +3
cs.CV 2026-05-22 reviewed

One model forecasts yields for many crops by learning their weather responses
PhenoYieldNet: Learning Crop-Aware Phenological Responses for Multi-Crop Yield Prediction

Yu Luo +6
cs.CV 2026-05-22 reviewed

DINOv3 beats ImageNet after finetuning on RGB inspection but loses on X-ray
Rethinking Transfer Learning for Industrial Inspection: DINOv3 vs. ImageNet Pretraining Across RGB and X-ray Tasks

Mehdi Gharbage +3
cs.CV 2026-05-22 reviewed

One-Forcing scores 83.76 on VBench for one-step video
One-Forcing: Towards Stable One-Step Autoregressive Video Generation

Jiaqi Feng +3
cs.CV 2026-05-22 reviewed

32x compression and linear attention enable fast image restoration
Efficient One-Step Diffusion Restoration Model with Compact Token Compression and Linear Attention

Bingtian Qiao +5
cs.LG 2026-05-22 reviewed

VAE decoder learns to respect non-commutative latent order
Commutator-Induced Uncertainty in VAEs

Tahereh Dehdarirad +3
cs.CV 2026-05-22 reviewed

Dynamic sparse attention delivers 2.1x video generation speedup
DFSAttn: Dynamic Fine-grained Sparse Attention for Efficient Video Generation

Jie Hu +3
cs.CV 2026-05-22 reviewed

Semantic scores trigger early stops in video motion search
FAST-ME: Foundation-aware Adaptive Stopping for Motion Estimation for Efficient IoT Video Analysis

Kakia Panagidi +1
cs.LG 2026-05-22 reviewed

Sample-wise attacks fool TTA while keeping label counts normal
Sample-wise Targeted Adversarial Attacks on Test-time Adaptation

Phuc Duc Nguyen +1