archive

Every paper Pith has read. Search by title, abstract, or pith.

9568 papers in cs.CV · page 10

cs.GR 2026-05-19 reviewed

Neural fields guide free Gaussians to capture layered clothing
PiG-Avatar: Hierarchical Neural-Field-Guided Gaussian Avatars

Julian Kaltheuner +4
cs.GR 2026-05-19 reviewed

Free Gaussians in neural space model avatars with complex clothing
PiG-Avatar: Hierarchical Neural-Field-Guided Gaussian Avatars

Julian Kaltheuner +4
cs.CV 2026-05-19 reviewed

LoRA upgrade turns text-to-image flows bidirectional
FullFlow: Upgrading Text-to-Image Flow Matching Models for Bidirectional Vision--Language Generation

Eric Tillmann Bill +3
cs.CV 2026-05-19 reviewed

Benchmark enables reliable testing of multi-shot audio-video models
MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video Generation

Yujie Wei +22
cs.CL 2026-05-19 reviewed

Staged perception training boosts VLM accuracy with shorter reasoning
From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models

Juncheng Wu +8
cs.CV 2026-05-19 reviewed

AUDITS benchmark tests detectors on 530K manipulated images
Multi-axis Analysis of Image Manipulation Localization

Keanu Nichols +5
cs.CV 2026-05-19 reviewed

New test reveals VLMs ignore camera motion in spatial tasks
CaMo: Camera Motion Grounded Evaluation and Training for Vision-Language Models

Hsiang-Wei Huang +5
cs.CV 2026-05-19 reviewed

Prototype layer matches ResNet accuracy on composite X-ray defects
Interpretable Computer Vision for Defect Detection in X-ray Tomography of Aerospace SiC/SiC Composites

Antonio Pe\~na Corredor +4
cs.CV 2026-05-19 reviewed

Counterfactual tests expose failures in LVLM attribution for chest X-rays
Rethinking Visual Attribution for Chest X-ray Reasoning in Large Vision Language Models

Guangzhi Xiong +4
cs.CV 2026-05-19 reviewed

Billion-scale 3D Gaussians train on one 24 GB GPU
TideGS: Scalable Training of Over One Billion 3D Gaussian Splatting Primitives via Out-of-Core Optimization

Chonghao Zhong +6
cs.CV 2026-05-19 reviewed

Dataset lets AI models generate native 100MP images
PixVerve: Advancing Native UHR Image Generation to 100MP with a Large-Scale High-Quality Dataset

Haojun Chen +13
cs.CV 2026-05-19 reviewed

Natural-language concepts replace tokens for multi-target segmentation
SetCon: Towards Open-Ended Referring Segmentation via Set-Level Concept Prediction

Zhixiong Zhang +8
cs.CV 2026-05-19 reviewed

One model handles any-to-any translation across five remote sensing modalities
MetaEarth-MM: Unified Multimodal Remote Sensing Image Generation with Scene-centered Joint Modeling

Zhiping Yu +6
cs.CV 2026-05-19 reviewed

First-frame spatial prompts raise cross-scene trajectory accuracy
Spatially Prompted Visual Trajectory Prediction for Egocentric Manipulation

Yifan Li +3
cs.CV 2026-05-19 reviewed

VLM-guided DPO lifts driving model human alignment by 12%
VL-DPO: Vision-Language-Guided Finetuning for Preference-Aligned Autonomous Driving

Zhefan Xu +5
cs.CV 2026-05-19 reviewed

Adaptive Manifold Guidance conserves probability during strong guidance
Probability-Conserving Flow Guidance

Parsa Esmati +4
cs.CV 2026-05-19 reviewed

Pixel classification hits 95.48% accuracy on angiogram vessels
X-Ray cardiac angiographic vessel segmentation based on pixel classification using machine learning and region growing

E O Rodrigues +8
cs.CV 2026-05-19 reviewed

Small tables bind new visual concepts to word triggers
Tiny-Engram: Trigger-Indexed Concept Tables for Generative Vision

Runyuan Cai +3
cs.CV 2026-05-19 reviewed

Pix2pix network segments heart fat on CT scans with 99% accuracy
Cardiac fat segmentation using computed tomography and an image-to-image conditional generative adversarial neural network

Guilherme Santos da Silva +3
cs.CV 2026-05-19 reviewed

SDM improves adversarial attack performance and efficiency by reconstructing the…
SDM: A Powerful Tool for Evaluating Model Robustness

Xinlei Liu +5
cs.CV 2026-05-19 reviewed

Second opacity per Gaussian cleans up object masks
OP2GS: Object-Aware 3D Gaussian Splatting with Dual-Opacity Primitives

Guiyu Liu +4
cs.CV 2026-05-19 reviewed

Pruning 90% non-text tokens cuts omni-LLM cost by 9x
Stage-adaptive Token Selection for Efficient Omni-modal LLMs

Zijie Xin +6
cs.CV 2026-05-19 reviewed

Nash equilibrium scores filter unstable multimodal reasoning steps
A Nash Equilibrium Framework For Training-Free Multimodal Step Verification

Rohit Sinha +5
eess.IV 2026-05-19 reviewed

Frequency priors guide short-video quality scores
FGSVQA: Frequency-Guided Short-form Video Quality Assessment

Xinyi Wang +3
eess.IV 2026-05-19 reviewed

CryoNet maps debris-covered glaciers at 90 percent IoU
CryoNet: A Deep Learning Framework for Multi-Modal Debris-Covered Glacier Mapping. A Case Study of the Poiqu Basin, Central Himalaya

Farzaneh Barzegar +3
cs.CV 2026-05-19 reviewed

Anime-trained VLM turns sparse sketches into aligned video outputs
CogOmniControl: Reasoning-Driven Controllable Video Generation via Creative Intent Cognition

Hongji Yang +6
cs.RO 2026-05-19 reviewed

Four photodiodes replace cameras for robot odometry
Minimalist Visual Inertial Odometry

Francesco Pasti +3
cs.RO 2026-05-19 reviewed

Visual encoder spatial detail fix unlocks precise robot tasks
Beyond Binary Success: A Diagnostic Meta-Evaluation Framework for Fine-Grained Manipulation

He-Yang Xu +7
cs.CV 2026-05-19 reviewed

Illumination priors guide selective recovery in dark photos
InterLight: Leveraging Intrinsic Illumination Priors for Low-Light Image Enhancement

Ziqi Wang +5
cs.CV 2026-05-19 reviewed

Video transcript grounding reward lifts planning accuracy by 7-16 points
RECIPE: Procedural Planning via Grounding in Instructional Video

Luigi Seminara +2
cs.CV 2026-05-19 reviewed

Fusing lifted panoramas yields long-range navigable 3D worlds from text
SphericalDreamer: Generating Navigable Immersive 3D Worlds with Panorama Fusion

Antoine Schnepf +3
cs.CV 2026-05-19 reviewed

World-ego split lifts long-horizon hybrid robot modeling
World-Ego Modeling for Long-Horizon Evolution in Hybrid Embodied Tasks

Zuyao Lin +5
cs.CV 2026-05-19 reviewed

Refined gradient attention rollout identifies surviving semantic regions to guide…
Towards Fine-Grained Robustness: Attention-Guided Test-Time Prompt Tuning for Vision-Language Models

Jia-Wei Hai +2
cs.CV 2026-05-19 reviewed

VLMs and agents miss over half the score on wild road damage
WildRoadBench: A Wild Aerial Road-Damage Grounding Benchmark for Vision-Language Models and Autonomous Agents

Bingnan Liu +9
cs.CV 2026-05-19 reviewed

Future emotion prediction raises multimodal recognition accuracy
AffectVerse: Emotional World Models for Multimodal Affective Computing

Bo Zhao +5
cs.CV 2026-05-19 reviewed

One pass turns sparse aerial photos into full 3D city models
Feed-Forward Gaussian Splatting from Sparse Aerial Views

Dongli Wu +6
cs.CV 2026-05-19 reviewed

Model fuses lidar and plot data for lower-bias forest biomass maps
StruMPL: Multi-task Dense Regression under Disjoint Partial Supervision and MNAR Labels

Reza M. Asiyabi +4
cs.CV 2026-05-19 reviewed

SplitQ keeps 93.5% accuracy at 3-bit VLM quantization
Breaking Modality Heterogeneity in Low-Bit Quantization for Large Vision-Language Models

Yi Zhong +4
cs.CV 2026-05-19 reviewed

Diversity in memory buffers improves TTA under tight constraints
GoTTA be Diverse: Rethinking Memory Policies for Test-Time Adaptation

Shyma Alhuwaider +4
cs.GR 2026-05-19 reviewed

3D Gaussians replace grids for continuous color mapping
GLUT: 3D Gaussian Lookup Table for Continuous Color Transformation

Danna Xue +3
cs.CV 2026-05-19 reviewed

U-Net feature energy cuts Janus rate in text-to-3D
Structural Energy Guidance for View-Consistent Text-to-3D Generation

Qing Zhang +4
cs.CV 2026-05-19 reviewed

Persona prompts lift construction safety checks by 12 percent
Passive Construction Site Safety Monitoring via Persona-Scaffolded Adversarial Chain-of-Thought VLM Verification

Ananth Sriram +2
cs.CV 2026-05-19 reviewed

New decoder head raises wound segmentation Dice to 81.9%
WoundFormer: Multi-Scale Spatial Feature Fusion for Multi-Class Wound Tissue Segmentation

Muhammad Ashad Kabir +1
cs.CV 2026-05-19 reviewed

Layout priors raise markdown F1 from 0.37 to 0.92 on OOD docs
Structured Layout Priors for Robust Out-of-Distribution Visual Document Understanding

Peter El Hachem +4
cs.CV 2026-05-19 reviewed

Score-based guidance fixes viewpoint estimation in diffusion models
Landscape-Awareness for Geometric View Diffusion Model

Yan-Ting Chen +3
cs.CV 2026-05-19 reviewed

VLMs lag on gaze following and social attention benchmarks
Eyes on VLM: Benchmarking Gaze Following and Social Gaze Prediction in Vision Language Models

Hengfei Wang +3
cs.CV 2026-05-19 reviewed

VLMs trail visual models on gaze following and social attention
Eyes on VLM: Benchmarking Gaze Following and Social Gaze Prediction in Vision Language Models

Hengfei Wang +3
cs.CV 2026-05-19 reviewed

Zero-shot image models fall short on concept faithfulness for XAI
A Framework for Evaluating Zero-Shot Image Generation in Concept-based Explainability

Giacomo Astolfi +4
cs.CV 2026-05-19 reviewed

Dense benchmark exposes open VLMs' gaps on subtle human actions
FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

Gueter Josmy Faure +4
cs.CV 2026-05-19 reviewed

Open VLMs struggle with fine details in human video actions
FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

Gueter Josmy Faure +4