archive

Every paper Pith has read. Search by title, abstract, or pith.

9568 papers in cs.CV · page 13

cs.CV 2026-05-19 reviewed

Optimal transport merges 3DGS primitives down to 10 percent
MMGS: 10$\times$ Compressed 3DGS through Optimal Transport Aggregation based on Multi-view Ranking

Beizhen Zhao +4
cs.CV 2026-05-19 reviewed

Shared subspaces cut parameters 87 percent in continual VLM learning
iGSP:Implicit Gradient Subspace Projection for Efficient Continual Learning of Vision-Language Models

Xuezhi Cui +10
cs.CV 2026-05-19 reviewed

Dense synthetic images boost segmentation accuracy
What Makes Synthetic Data Effective in Image Segmentation

Jinjin Zhang +4
cs.CV 2026-05-19 reviewed

Brain network experts enable competitive fMRI semantic decoding
FPED: A Functional-Network Prior-Guided Mixture-of-Experts Framework for Interpretable Brain Decoding

Yudan Ren +4
cs.AI 2026-05-19 reviewed

Quadtrees cut GUI agent visual tokens by 30 percent
AQuaUI: Visual Token Reduction for GUI Agents with Adaptive Quadtrees

Yuankai Li +5
cs.CV 2026-05-19 reviewed

Flow-map endpoint velocity replaces fake-score network
Distribution Matching Distillation without Fake Score Network

Youngjoong Kim +2
cs.CV 2026-05-19 reviewed

LLM templates expand NAS to discover better architectures
Structuring Open-Ended NAS: Semi-Automated Design Knowledge Structuring with LLMs for Efficient Neural Architecture Search

Yuiko Sakuma +6
cs.CV 2026-05-19 reviewed

Post-training lifts video models' physical consistency
PhyWorld: Physics-Faithful World Model for Video Generation

Pu Zhao +12
cs.CV 2026-05-19 reviewed

Method reduces age bias in medical image classification by decorrelating difficulty
Robust Mitigation of Age-Dependent Confounding Effects via Sample-Difficulty Decorrelation

Nikhil Cherian Kurian +4
cs.CV 2026-05-19 reviewed

HAVEN benchmark aligns video and text across three levels
HAVEN: Hierarchically Aligned Multimodal Benchmark for Unified Video Understanding

Mengqi Shi +1
cs.CV 2026-05-19 reviewed

PCA rotation aligns key channels for accurate VLM pruning
Rotation-Aligned Key Channel Pruning for Efficient Vision-Language Model Inference

Beomseok Kang +4
cs.LG 2026-05-19 reviewed

Regularizer cuts demographic gaps in medical image AI
Worst-Group Equalized Odds Regularization for Multi-Attribute Fair Medical Image Classification

Nikhil Cherian Kurian +8
cs.CV 2026-05-19 reviewed

Smartphone video measures forest trees with ~2 cm accuracy
Smartphone-based Circular Plot Sampling for Forest Inventory

Su Sun +4
cs.CV 2026-05-19 reviewed

Quasi-concavity enforces convex shapes in segmentation networks
D-Convexity: A Unified Differentiable Convex Shape Prior via Quasi-Concavity for Data-driven Image Segmentation

Shengzhe Chen +1
cs.CV 2026-05-19 reviewed

Quantized model cuts brain tumor AI size by 6x with same accuracy
Quantized Machine Learning Models for Medical Imaging in Low-Resource Healthcare Settings

Sumanth Meenan Kanneti +1
cs.CV 2026-05-18 reviewed

Layer-wise compression on image stats yields human-like visual features
Efficient coding along the visual hierarchy

Ananya Passi +2
cs.CV 2026-05-18 reviewed

Freezing image models yields competitive video performance
Towards Data-Efficient Video Pre-training with Frozen Image Foundation Models

Svetlana Orlova +2
cs.CV 2026-05-18 reviewed

SSL pretraining helps models know when to skip DR predictions
Knowing When Not to Predict: Self Supervised Learning and Abstention for Safer DR Screening

Muskaan Chopra +3
cs.LG 2026-05-18 reviewed

VLMs need tight data alignment and miss weak signals in egocentric video
EgoBabyVLM: Benchmarking Cross-Modal Learning from Naturalistic Egocentric Video Data

Dongyan Lin +21
cs.CV 2026-05-18 reviewed

Diffusion model turns uniform organ maps into realistic PET scans
Generation of Heterogeneous PET Images from Uniform Organ Activity Maps Using a Pretrained Domain-Adapted Diffusion Model

Suya Li +4
cs.CV 2026-05-18 reviewed

FAGER metric leads in factual checks for AI image generators
FAGER: Factually Grounded Evaluation and Refinement of Text-to-Image Models

Youngsun Lim +3
cs.CV 2026-05-18 reviewed

CRAFT pipeline leads MAGMaR video QA at 0.739 average
CRAFT: Critic-Refined Adaptive Key-Frame Targeting for Multimodal Video Question Answering

Mahesh Bhosale +5
cs.CV 2026-05-18 reviewed

Multi-horizon training captures longer solar forecast dependencies
Learning Long-Term Temporal Dependencies in Photovoltaic Power Output Prediction Through Multi-Horizon Forecasting

Sumit Laha +2
cs.CV 2026-05-18 reviewed

LiFT lifts 2D generators to coherent 3D medical volumes
LiFT: Lifted Inter-slice Feature Trajectories for 3D Image Generation from 2D Generators

Xinhe Zhang +5
cs.RO 2026-05-18 reviewed

RL fine-tuning aligns traffic simulations with real data
RLFTSim: Realistic and Controllable Multi-Agent Traffic Simulation via Reinforcement Learning Fine-Tuning

Ehsan Ahmadi +7
cs.CV 2026-05-18 reviewed

One photo produces a mask that defeats facial recognition on any image
Personalized Face Privacy Protection From a Single Image

Zachary Yahn +7
cs.CV 2026-05-18 reviewed

New benchmark tests medical AI models on real-world image shifts
MedFM-Robust: Benchmarking Robustness of Medical Foundation Models

Xiangxiang Cui +4
cs.CV 2026-05-18 reviewed

Benchmark tests medical AI models on real-world variations
MedFM-Robust: Benchmarking Robustness of Medical Foundation Models

Xiangxiang Cui +4
cs.CV 2026-05-18 reviewed

Foundation models fail to spot unseen iris attacks and spectral changes
A Systematic Failure Analysis of Vision Foundation Models for Open Set Iris Presentation Attack Detection

Rahul Anand +4
cs.CV 2026-05-18 reviewed

75 real urban walks released with head poses and gaze for trajectory models
EgoTraj: Real-World Egocentric Human Trajectory Dataset for Multimodal Prediction

Ahmad Yehia +6
cs.CV 2026-05-18 reviewed

MLLMs often miss artifacts in AI videos
Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos

Yuqi Tang +23
cs.CV 2026-05-18 reviewed

Self-supervised backbones boost artwork classification
Harnessing Self-Supervised Features for Art Classification

Federico Melis +4
cs.CV 2026-05-18 reviewed

LLM gains part-level and time-step control over human motion
MotionMERGE: A Multi-granular Framework for Human Motion Editing, Reasoning, Generation, and Explanation

Bizhu Wu +7
cs.CV 2026-05-18 reviewed

COLMAP metrics match humans 4x better on 3D view consistency
Can These Views Be One Scene? Evaluating Multiview 3D Consistency when 3D Foundation Models Hallucinate

Soumava Paul +2
cs.SD 2026-05-18 reviewed

Direct waveform audio matches latent methods on benchmarks
WavFlow: Audio Generation in Waveform Space

Feiyan Zhou +8
cs.CV 2026-05-18 reviewed

VLM agent turns vague requests into video edit plans
Aurora: Unified Video Editing with a Tool-Using Agent

Yongsheng Yu +6
cs.CV 2026-05-18 reviewed

Active exploration outperforms passive in spatial intelligence tasks
ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop

Yining Hong +7
cs.CV 2026-05-18 reviewed

Self-distillation from crops boosts MLLM detail recognition
Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

Qianhao Yuan +6
cs.CV 2026-05-18 reviewed

NVFP4 and balanced SP enable 2x faster long video training
LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

Yukang Chen +15
cs.CV 2026-05-18 reviewed

Diffusion models generate faster by growing resolution during denoising
Spectral Progressive Diffusion for Efficient Image and Video Generation

Howard Xiao +3
cs.CV 2026-05-18 reviewed

Diffusion models speed up by growing resolution during denoising
Spectral Progressive Diffusion for Efficient Image and Video Generation

Howard Xiao +3
cs.CV 2026-05-18 reviewed

Single photo gains full PBR lighting control via shared intrinsic maps
PIXLRelight: Controllable Relighting via Intrinsic Conditioning

Miguel Farinha +1
cs.CV 2026-05-18 reviewed

Dual-view selection lifts ego-exo memory accuracy to 58.2 percent
EgoExoMem: Cross-View Memory Reasoning over Synchronized Egocentric and Exocentric Videos

Ruiping Liu +9
cs.CV 2026-05-18 reviewed

Entity ID tracking stops character drift in AI videos
Advancing Narrative Long Video Generation via Training-Free Identity-Aware Memory

Jinzhuo Liu +7
cs.RO 2026-05-18 reviewed

Robots evolve navigation rules from their own successes and failures
Robo-Cortex: A Self-Evolving Embodied Agent via Dual-Grain Cognitive Memory and Autonomous Knowledge Induction

Nga Teng Chan +11
cs.CV 2026-05-18 reviewed

Online steering halves unsafe content in diffusion models
SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training

Komal Kumar +5
cs.CV 2026-05-18 reviewed

Segmentation proxy aligns multimodal understanding and generation
Semantic Generative Tuning for Unified Multimodal Models

Songsong Yu +3
cs.CV 2026-05-18 reviewed

Training augmentations alone match FGIR accuracy without crops
A Large-Scale Study on the Accuracy vs Cost Trade-offs of Training and Evaluation Settings in Fine-Grained Image Recognition

Edwin Arkel Rios +7
cs.CV 2026-05-18 reviewed

3D concept scaffold fixes prompt ambiguity in avatar retrieval
CMAG: Concept-Scaffolded Retrieval for Marketplace Avatar Generation

Rajeev Goel +5
cs.CV 2026-05-18 reviewed

Lance beats prior open models at image and video generation
Lance: Unified Multimodal Modeling by Multi-Task Synergy

Fengyi Fu +12