archive

Every paper Pith has read. Search by title, abstract, or pith.

9568 papers in cs.CV · page 6

cs.CV 2026-05-21 reviewed

DoRA raises VLA success rates by 10.4 points over SFT
CrossVLA: Cross-Paradigm Post-Training and Inference Optimization for Vision-Language-Action Models

Zhi Liu
cs.CV 2026-05-21 reviewed

Seizure video dataset yields 0.96 F1 on epileptic classification
Seizure-Semiology-Suite (S3): A Clinically Multimodal Dataset, Benchmark, and Models for Seizure Semiology Understanding

Lina Zhang +25
eess.IV 2026-05-20 reviewed

PET/CT model matches full segmentation accuracy with 10% labels
An Open Multi-Center Whole-Body FDG PET/CT Foundation Model for Tumor Segmentation

Xiaofeng Liu +6
eess.IV 2026-05-20 reviewed

Embeddings support 99% accurate tomato field mapping
Mapping Tomato Cropping Systems in California Using AlphaEarth Geospatial Embeddings and Deep Learning Analysis

Mohammadreza Narimani +2
cs.CV 2026-05-20 reviewed

Context rewrite lifts 3D grounding accuracy by up to 22 points
MM-Conv: A Multimodal Dataset and Benchmark for Context-Aware Grounding in 3D Dialogue

Anna Deichler +6

3 Piths
cs.CV 2026-05-20 reviewed

Scene graph matching grounds 3D objects from language without training
SceneGraphGrounder: Zero-Shot 3D Visual Grounding via Structured Scene Graph Matching

Xuefei Sun +4
cs.CV 2026-05-20 reviewed

Diffusion model relights full-body videos consistently under new lights
BodyReLux: Temporally Consistent Full-Body Video Relighting

Li Ma +6
cs.CV 2026-05-20 reviewed

4D geometry supervision lifts robot video models to 81% success
GEM-4D: Geometry-Enhanced Video World Models for Robot Manipulation

Kaichen Zhou +10
cs.CV 2026-05-20 reviewed

VLMs give better 3D vehicle dimensions than lidar in occluded cases
Improving 3D Labeling in Self-Driving by Inferring Vehicle Information using Vision Language Models

Steven Chen +2
cs.CV 2026-05-20 reviewed

Lightweight cross-encoder matches LLM judges for caption evaluation
BEiTScore: Reference-free Image Captioning Evaluation with an Efficient Cross-Encoder Model

Gon\c{c}alo Gomes +2
cs.CV 2026-05-20 reviewed

Vision-IMU attention fusion cuts hand tracking error by 16%
AVI-HT: Adaptive Vision-IMU Fusion for 3D Hand Tracking

Ziyi Kou +6
eess.IV 2026-05-20 reviewed

HSR methods vary by over 13 dB across degradation types
HyperBench: Standardizing and Scaling Synthetic Evaluation for Hyperspectral Super-Resolution

Ritik Shah +1
cs.CV 2026-05-20 reviewed

AI turns T1 scans into motion-free high-res MRIs
MRecover: A Conditional Generative Model for Recovering Motion-Corrupted MR images Using AI Generated Contrast

Jinghang Li +15
cs.LG 2026-05-20 reviewed

Stochastic policy amortizes diffusion guidance for 5x faster sampling
Hierarchical Variational Policies for Reward-Guided Diffusion

Kushagra Pandey +4
cs.CV 2026-05-20 reviewed

Ultrasound VQA model learns to zoom closer before diagnosing
Look-Closer-Then-Diagnose: Confidence-Aware Ultrasound VQA via Active Zooming

Yue Zhou +7
cs.CV 2026-05-20 reviewed

VLMs retain gains after corrupting thought tokens
Ablate-to-Validate: Are Vision-Language Models Really Using Continuous Thought Tokens?

Tianyi Zhang +2
eess.IV 2026-05-20 reviewed

Three-plane aggregation raises stroke lesion Dice score
VRXU-net: A Deep Learning Approach for Brain Ischemic Stroke Lesion Detection and Segmentation in T1W MRI

Sayed Amir Mousavi Mobarakeh
cs.CV 2026-05-20 reviewed

New benchmark shows LVLMs falter on furniture assembly videos
Flat-Pack Bench: Evaluating Spatio-Temporal Understanding in Large Vision-Language Models through Furniture Assembly

Aditya Chetan +7
cs.CV 2026-05-20 reviewed

Text rendered on masks improves images and halves inference cost
UniVL: Unified Vision-Language Embedding for Spatially Grounded Contextual Image Generation

Jiayun Wang +4
cs.CV 2026-05-20 reviewed

Agents evolve image generation by distilling trajectory differences
GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

Sixiang Chen +9
cs.CV 2026-05-20 reviewed

Agents evolve image generation by distilling trajectory differences
GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

Sixiang Chen +9
cs.CV 2026-05-20 reviewed

3.8B model rivals larger ones using 19% of the training compute
Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models

Dong Chen +20
cs.LG 2026-05-20 reviewed

Amortized noise sampling cuts diffusion teacher variance 10x
Variance Reduction for Expectations with Diffusion Teachers

Jesse Bettencourt +4
cs.LG 2026-05-20 reviewed

Amortized resampling yields 2-3x compute gains for diffusion teachers
Variance Reduction for Expectations with Diffusion Teachers

Jesse Bettencourt +4
cs.CV 2026-05-20 reviewed

Single editing task lifts understanding
Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning

Dian Zheng +6
cs.CV 2026-05-20 reviewed

One editing task improves understanding
Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning

Dian Zheng +6
cs.CV 2026-05-20 reviewed

Fixed-point distillation matches multi-step diffusion in one pass
One-Step Distillation of Discrete Diffusion Image Generators via Fixed-Point Iteration

Chaoyang Wang +1
cs.CV 2026-05-20 reviewed

Unified model generates simulation-ready 3D assets across object types
PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects

Ziang Cao +7
cs.CV 2026-05-20 reviewed

WikiVQABench tests VLMs on Wikipedia questions needing external knowledge
WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata

Basel Shbita +2
cs.CV 2026-05-20 reviewed

Latent dynamics model yields coherent full-body avatar animations
Latent Dynamics for Full Body Avatar Animation

Shichong Peng +9
cs.CV 2026-05-20 reviewed

Evidential memory turns frozen 3D generators into streaming systems
Stream3D: Sequential Multi-View 3D Generation via Evidential Memory

Kaichen Zhou +5
cs.CV 2026-05-20 reviewed

Few-step streaming adapts generators for video editing without training
StreamGVE: Training-Free Video Editing via Few-Step Streaming Video Generation

Guanlong Jiao +4
cs.CV 2026-05-20 reviewed

Prototypes and pathways fuse for cancer survival prediction with built-in explanations
ProtoPathway: Biologically Structured Prototype-Pathway Fusion for Multimodal Cancer Survival Prediction

Amaya Gallagher-Syed +4
cs.CV 2026-05-20 reviewed

VLMs miss most time-based glitches in game videos
TempGlitch: Evaluating Vision-Language Models for Temporal Glitch Detection in Gameplay Videos

Yakun Yu +6
cs.CV 2026-05-20 reviewed

Two-frame recurrent method restores turbulence videos efficiently
ReMATF: Recurrent Motion-Adaptive Multi-scale Turbulence Mitigation for Dynamic Scenes

Zhiming Liu +2
cs.CV 2026-05-20 reviewed

New method masters interactive video try-on with hand and action guidance
iTryOn: Mastering Interactive Video Virtual Try-On with Spatial-Semantic Guidance

Jun Zheng +8
cs.CV 2026-05-20 reviewed

Smartphone runs full gait analysis locally without cloud upload
AIGaitor: Privacy-preserving and cloud-free motion analysis for everyone, using edge computing

Lauhitya Reddy +2
cs.LG 2026-05-20 reviewed

Gossip-based critic sharing lifts multi-cell OFDMA sum-rates in 6G
FedCritic: Serverless Federated Critic Learning-based Resource Allocation for Multi-Cell OFDMA in 6G

Amin Farajzadeh +1
cs.CV 2026-05-20 reviewed

Top-n encoder selection lifts blended emotion accuracy
Ordering Matters: Rank-Aware Selective Fusion for Blended Emotion Recognition

Junghyun Lee +3
cs.RO 2026-05-20 reviewed

3D point clouds lift VLA robot success by 10%
PointACT: Vision-Language-Action Models with Multi-Scale Point-Action Interaction

Shizhe Chen +2
cs.CV 2026-05-20 reviewed

Road videos now produce captions with chosen tone
RoadTones: Tone Controllable Text Generation from Road Event Videos

Chirag Parikh +2
cs.CV 2026-05-20 reviewed

One model shifts image restoration from precise to creative
Disentangling Generation and Regression in Stochastic Interpolants for Controllable Image Restoration

Yi Liu +5
cs.CV 2026-05-20 reviewed

Simulation feedback picks best synthetic scenes for driving models
Closed Loop Dynamic Driving Data Mixture for Real-Synthetic Co-Training

Hongzhi Ruan +7
cs.CV 2026-05-20 reviewed

Diffusion model fills Antarctic Landsat gaps without references
A Non-Reference Diffusion-Based Restoration Framework for Landsat 7 ETM+ SLC-off Imagery in Antarctica

Leyue Tang +3
cs.CV 2026-05-20 reviewed

Model fixes occlusion order in overlapping layout-to-image scenes
OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation

Ziye Li +1
cs.CV 2026-05-20 reviewed

Hyper-V2X estimates epistemic and aleatoric uncertainty in cooperative BEV segmentation
Hyper-V2X: Hypernetworks for Estimating Epistemic and Aleatoric Uncertainty in Cooperative Bird's-Eye-View Semantic Segmentation

Abhishek Dinkar Jagtap +2
cs.CV 2026-05-20 reviewed

Adaptive fusion gives linear SSMs flexible vision and 3D fusion
Deformba: Vision State Space Model with Adaptive State Fusion

Hongyu Ke +6
cs.LG 2026-05-20 reviewed

Contrasting patients with controls isolates disease subgroups
Automatic Discovery of Disease Subgroups by Contrasting with Healthy Controls

Robin Louiset +4
cs.CV 2026-05-20 reviewed

Reweighting image-negative tokens cuts LVLM hallucinations
Reducing Object Hallucination in LVLMs via Emphasizing Image-negative Tokens

Meng Shen +2
cs.CV 2026-05-20 reviewed

Continuous flow matching generates realistic EEG signals
Let EEG Models Learn EEG

Yifan Wang +3