archive

Every paper Pith has read. Search by title, abstract, or pith.

9568 papers in cs.CV · page 3

cs.CV 2026-05-21 reviewed

Benchmark shows MLLMs fail on 16-minute continuous video reasoning
VideoOdyssey: A Benchmark for Ultra-Long-Context and Omni-Modal Video Understanding

Haichen He +5
cs.CV 2026-05-21 reviewed

Projector fix lifts Video-LLM motion direction accuracy from 26% to 85%
Which Way Did It Move? Diagnosing and Overcoming Directional Motion Blindness in Video-LLMs

Jongseo Lee +5
cs.CV 2026-05-21 reviewed

Camera pose tokens lift video model spatial scores 4.5-6.5%
Cambrian-P: Pose-Grounded Video Understanding

Jihan Yang +7
cs.CV 2026-05-21 reviewed

Reasoning adds secondary motions for natural video
MotiMotion: Motion-Controlled Video Generation with Visual Reasoning

Lee Hsin-Ying +5
cs.RO 2026-05-21 reviewed

Self-awareness module improves language-guided navigation
AwareVLN: Reasoning with Self-awareness for Vision-Language Navigation

Wenxuan Guo +9
cs.RO 2026-05-21 reviewed

Gestures raise robot object selection accuracy in cluttered scenes
GesVLA: Gesture-Aware Vision-Language-Action Model Embedded Representations

Wenxuan Guo +9
cs.CV 2026-05-21 reviewed

Dashcam videos turned into full AV multi-sensor data
Sensor2Sensor: Cross-Embodiment Sensor Conversion for Autonomous Driving

Jiahao Wang +14
cs.CV 2026-05-21 reviewed

Metro suicide risk scored from video by tracking and heatmaps
Suicide Risk Assessment from AI-powered Video Surveillance: An Interpretable Framework for Prevention in Metro Stations

Safwen Naimi +3
cs.CV 2026-05-21 reviewed

VLMs keep high scores after most image tokens are deleted
Seeing without Looking: Do Vision-Language Benchmarks Really Test Vision?

Zixuan Lan +3
cs.CV 2026-05-21 reviewed

Queries raise PSNR by 3.6 dB and cut convergence time by 3x in frozen autoencoders
DecQ: Detail-Condensing Queries for Enhanced Reconstruction and Generation in Representation Autoencoders

Tianhang Wang +5
cs.CV 2026-05-21 reviewed

Synthetic faces alone match real data for rare pediatric disease AI
Synthetic Data Alone is Enough? Rethinking Data Scarcity in Pediatric Rare Disease Recognition

Ganlin Feng +6
cs.CV 2026-05-21 reviewed

Generated images show anomalous ultra-high-frequency spectral uplift
Spectral Tail Auxiliary Learning for AI-Generated Image Detection

Xingyi Li +4
cs.CV 2026-05-21 reviewed

Retrieval keeps video worlds consistent at double speed
WorldKV: Efficient World Memory with World Retrieval and Compression

Jung Yi +5
cs.CV 2026-05-21 reviewed

Simulated dense placements train IMU model that ignores sensor setup
AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild

Baiyu Chen +7
cs.CV 2026-05-21 reviewed

Multiview cues and orientation prompts lift zero-shot action recognition
Cross-Domain Human Action Recognition from Multiview Motion and Textual Descriptions

Yannick Porto +3
cs.CV 2026-05-21 reviewed

Synthetic viewpoints plus state-space encoding boost action detection
Improving Viewpoint-Invariance and Temporal Consistency for Action Detection

Yannick Porto +3
cs.CV 2026-05-21 reviewed

Disentangling vision-language embeddings without added dimensions
Conceptualizing Embeddings: Sparse Disentanglement for Vision-Language Models

Piotr Kubaty +5
cs.CV 2026-05-21 reviewed

Taylor expansion picks surprising frames in long videos
Swift Sampling: Selecting Temporal Surprises via Taylor Series

Dahye Kim +5
cs.CV 2026-05-21 reviewed

One ConvNeXt model serves many compute budgets
Slimmable ConvNeXt: Width-Adaptive Inference for Efficient Multi-Device Deployment

Janek Haberer +2
cs.CV 2026-05-21 reviewed

Coherent behavior vectors let VLA models match top results with half the data
From Abstraction to Instantiation: Learning Behavioral Representation for Vision-Language-Action Model

Bing Hu +6
cs.CV 2026-05-21 reviewed

SEGA adapts attention scaling to latent frequencies for higher-res DiT outputs
SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers

Javad Rajabi +4
cs.CV 2026-05-21 reviewed

Sparse autoencoder links reasoning steps to image masks
SegCompass: Exploring Interpretable Alignment with Sparse Autoencoders for Enhanced Reasoning Segmentation

Zhenyu Lu +6
cs.CL 2026-05-21 reviewed

Images boost LLM poetry detectors past RoBERTa
Seeing the Poem: Image-Semantic Detection of AI-Generated Modern Chinese Poetry with MLLMs

Shanshan Wang +8
cs.CV 2026-05-21 reviewed

Nonce substitutions rank captions for better VL data selection
What Does the Caption Really Say? Counterfactual Phrase Intervention for Compositional Data Selection in Vision-Language Pretraining

Hyejin Go +2
cs.CV 2026-05-21 reviewed

Causal model matches age changes in spine DXA images
From Baseline to Follow-Up: Counterfactual Spine DXA Image Synthesis in UK Biobank Using a Causal Hierarchical Variational Autoencoder

Yilin Zhang +3
cs.LG 2026-05-21 reviewed

CAME-Grad optimizer lifts radiology reports by 2 percent
The Double Dilemma in Multi-Task Radiology Report Generation: A Gradient Dynamics Analysis and Solution

Erjian Zhang +3
cs.LG 2026-05-21 reviewed

CAME-Grad fixes gradient double dilemma in report generation
The Double Dilemma in Multi-Task Radiology Report Generation: A Gradient Dynamics Analysis and Solution

Erjian Zhang +3
cs.CV 2026-05-21 reviewed

Five functional body clusters improve full-pose reconstruction from head and hands
AtomicMotion: Learning Human Motion From Different Human Parts

Runzhen Liu +2
cs.CV 2026-05-21 reviewed

Physics priors train dense human scene flow from monocular video
H-Flow: Self-supervised Human Scene Flow via Physics-inspired Joint Multi-modal Learning

Zhanbo Huang +2
cs.CV 2026-05-21 reviewed

Graph reasoning turns radiology reports into precise 3D lesion maps
GLeVE: Graph-Guided Lesion Grounding with Proposal Verification in 3D CT

Shuo Jiang +11
cs.CV 2026-05-21 reviewed

Head-conditioned LoRA lifts gaze following on non-salient targets
Enhancing Gaze Reasoning in Vision Foundation Models for Gaze Following

Shijing Wang +6
cs.RO 2026-05-21 reviewed

Dual-interval motion cues decouple ego-motion for UAV detection
Decoupling Ego-Motion from Target Dynamics via Dual-Interval Motion Cues for UAV Detection

Liuyang Wang +1
cs.CV 2026-05-21 reviewed

No single noisy-label method wins for frozen vision models
Rethinking Noise-Robust Training for Frozen Vision Foundation Models: A Cross-Dataset Benchmark with a Case Study of Small-Loss Failure

Zitong Li +1
cs.CV 2026-05-21 reviewed

3D reconstruction turns floorplan localization into alignment task
SceneAligner: 3D-Grounded Floorplan Localization in the Wild

Junhyeong Cho +2
cs.CV 2026-05-21 reviewed

New metric shows detection limits online map accuracy
Beyond Chamfer Distance: Granular Order-aware Evaluation Metric For Online Mapping

Chouaib Bencheikh Lehocine +3
cs.CV 2026-05-21 reviewed

Attention maps for tumor sub-regions come free in one lightweight 3D model
SegGuidedNet: Sub-Region-Aware Attention Supervision for Interpretable Brain Tumor Segmentation

Hasaan Maqsood +4
cs.CV 2026-05-21 reviewed

Generative models create controlled videos to test MLLM spatio-temporal reasoning
VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis

Jinho Park +3
cs.CV 2026-05-21 reviewed

Fourier shape descriptors create time-consistent cell phantom videos
Cell Phantom Video Generation in Elliptical Fourier Descriptor Domain

Francesco Benedetto +3
cs.CV 2026-05-21 reviewed

Geometry must ground visual tokens before reasoning starts
GeoWeaver: Grounding Visual Tokens with Geometric Evidence before Scene Reasoning

Deshui Miao +5
cs.CV 2026-05-21 reviewed

Unified model handles many fashion search types at once
FashionLens: Toward Versatile Fashion Image Retrieval via Task-Adaptive Learning

Haokun Wen +5
cs.CV 2026-05-21 reviewed

Multimodal data improves two-wheeler rider behavior recognition
MOTOR: A Multimodal Dataset for Two-Wheeler Rider Behavior Understanding

Varun A. Paturkar +2
cs.CV 2026-05-21 reviewed

Similar cases form graphs that refine medical image diagnoses
Case-Aware Medical Image Classification with Multimodal Knowledge Graphs and Reliability-Guided Refinement

Yiming Xu +5
cs.CV 2026-05-21 reviewed

Motion and geometry cues boost SAM 2 tracking on nonlinear scenarios
Segment Anything with Motion, Geometry, and Semantic Adaptation for Complex Nonlinear Visual Object Tracking

Deyi Zhu +6
cs.CV 2026-05-21 reviewed

Degraded images break spatial reasoning in current AI
SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation

Xiaolong Zhou +10
cs.AI 2026-05-21 reviewed

Latent sharing speeds up collaborative driving coordination
LACO: Adaptive Latent Communication for Collaborative Driving

Tianhao Chen +2
cs.CV 2026-05-21 reviewed

Training-free method segments fine-grained fungi without retraining
Training-Free Fine-Grained Semantic Segmentations in Low Data Regimes: A FungiTastic Baseline

Sebastian Cavada +2
cs.CV 2026-05-21 reviewed

Discarded classifier weights act as semantic anchors
Supervised Classification Heads as Semantic Prototypes: Unlocking Vision-Language Alignment via Weight Recycling

David M\'endez +2
cs.CV 2026-05-21 reviewed

Multi-agent self-evolution sets SOTA on image retrieval benchmarks
DeliCIR: Deliberative Test-Time Evolutionary Hierarchical Multi-Agents for Composed Image Retrieval

Xingtian Pei +6
cs.CV 2026-05-21 reviewed

Masked metric improves agreement with humans on concept fidelity
MaSC: A Masked Similarity Metric for Evaluating Concept-Driven Generation

Patryk Bartkowiak +5
cs.CV 2026-05-21 reviewed

Fused geometry and appearance metric predicts synthetic data transfer
SADGE: Structure and Appearance Domain Gap Estimation of Synthetic and Real Data

Patryk Bartkowiak +4