archive

Every paper Pith has read. Search by title, abstract, or pith.

9568 papers in cs.CV · page 8

cs.CV 2026-05-20 reviewed

3D distillation speeds wheat spike volume estimation by 100x
3D Reconstruction and Knowledge Distillation to Improve Multi-View Image Models to Explore Spike Volume Estimation in Wheat

Olivia Zumsteg +6
cs.LG 2026-05-20 reviewed

Oscillatory network scales to ImageNet with high efficiency
Winfree Oscillatory Neural Network

Jiawen Dai +1
cs.CV 2026-05-20 reviewed

RISE makes self-evolving VLMs gain steadily without new labels
RISE: Reliable Improvement in Self-Evolving Vision-Language Models

Chaoran Xu +5
cs.CV 2026-05-20 reviewed

Tweedie matching across overlaps extends short video models to long sequences
FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching

Jangho Park +3
cs.CV 2026-05-20 reviewed

Hybrid routes inputs to concept or neural branch for accuracy gains
SynCB: A Synergy Concept-Based Model with Dynamic Routing Between Concepts and Complementary Neural Branches

Tores Julie +5
cs.CV 2026-05-20 reviewed

Frozen video model plus probe wins kitchen action challenge
JFAA: Technical Report for the EPIC-KITCHENS-100 Action Anticipation Challenge at EgoVis 2026

Qiaohui Chu +6
cs.CV 2026-05-20 reviewed

VISTA wins Ego4D STA challenge by fusing frozen video features into detector
VISTA: Technical Report for the Ego4D Short-Term Object Interaction Anticipation at EgoVis 2026

Qiaohui Chu +6
cs.CV 2026-05-20 reviewed

MLLM arbitration with ensemble reaches 70.49% on 306 fruits
FruitEnsemble: MLLM-Guided Arbitration for Heterogeneous ensemble in Fine-Grained Fruit Recognition

Enhui Yu +4
cs.CV 2026-05-20 reviewed

Two-level experts reduce redundancy in multimodal cancer survival models
HDMoE: A Hierarchical Decoupling-Fusion Mixture-of-Experts Framework for Multimodal Cancer Survival Prediction

Huayi Wang +7
cs.CV 2026-05-20 reviewed

Map anchors egocentric pose to eliminate drift
Map-Mono-Ego: Map-Grounded Global Human Pose Estimation from Monocular Egocentric Video

Hiroyuki Deguchi +5
cs.MA 2026-05-20 reviewed

Self-elicited reasoning and critic revision improve sarcasm detection
ProCrit: Self-Elicited Multi-Perspective Reasoning with Critic-Guided Revision for Multimodal Sarcasm Detection

Yingjia Xu +5
cs.CV 2026-05-20 reviewed

Polynomial alternatives match activation-based vision models
Activation-Free Backbones for Image Recognition: Polynomial Alternatives within MetaFormer-Style Vision Models

Jeffrey Wang +2
cs.CV 2026-05-20 reviewed

224K short videos collected by labels support semantic benchmarks
USV: Towards Understanding the User-generated Short-form Videos

Haoyue Cheng +5
cs.CV 2026-05-20 reviewed

New benchmark shows VLMs lag trained humans on building layouts
ArchSIBench: Benchmarking the Architectural Spatial Intelligence of Vision-Language Models

Qirui Shen +7
cs.CV 2026-05-20 reviewed

Two-stage model turns panoramic X-rays into accurate 3D dental volumes
HyDAR-Pano3D: A Hybrid Disentangled Anatomical Recovery Framework for Panoramic-to-3D Reconstruction

Yaoyao Yue +4
cs.CV 2026-05-20 reviewed

Witness cues turn missing 3D relations into usable training signals
RelWitness: Open-Vocabulary 3D Scene Graph Generation with Visual-Geometric Relation Witnesses

Minh Anh Nguyen +4
cs.CV 2026-05-20 reviewed

Visual-geometric cues recover missing 3D relations from incomplete labels
RelWitness: Open-Vocabulary 3D Scene Graph Generation with Visual-Geometric Relation Witnesses

Minh Anh Nguyen +4
cs.CV 2026-05-20 reviewed

TERDNet beats prior models at spotting scene changes
TERDNet: Transformer Encoder-Recurrent Decoder Network for Scene Change Detection

Jiae Yoon +1
cs.CV 2026-05-20 reviewed

Patch alignment spots changes in free-motion videos
VSCD: Video-based Scene Change Detection in Unaligned Scenes

Jiae Yoon +1
cs.CV 2026-05-20 reviewed

Single network pass reconstructs images with 2D Gaussians in 160-300 ms
AIR: Amortized Image Reconstruction Framework for Self-Supervised Feed-Forward 2D Gaussian Splatting

Zhaojie Zeng +3
cs.CV 2026-05-20 reviewed

Reranking OSGNet candidates with MLLM wins Ego4D challenge
OSGNet with MLLM Reranking @ Ego4D Episodic Memory Challenge 2026

Yisen Feng +7
cs.CV 2026-05-20 reviewed

Self-similarity alignment fixes high-res diffusion conflicts
Spatial Gram Alignment for Ultra-High-Resolution Image Synthesis

Jinjin Zhang +2
cs.CV 2026-05-20 reviewed

Canny map first keeps logos and text intact in subject edits
Decomposing Subject-Driven Image Generation via Intermediate Structural Prediction

Hanzhong Guo +1
cs.CV 2026-05-20 reviewed

OlmoEarth models cut training GPU hours by 1.7x
OlmoEarth v1.1: A more efficient family of OlmoEarth models

Gabriel Tseng +9
cs.CV 2026-05-20 reviewed

Connector degrades structural semantics in video editing
What Semantics Survive the Connector? Diagnosing VLM-to-DiT Alignment in Video Editing

Hangyu Lin +6
cs.CV 2026-05-20 reviewed

AI detectors flag fakes well but cannot identify the source model
Findings of the Counter Turing Test: AI-Generated Image Detection

Rajarshi Roy +18
cs.CV 2026-05-20 reviewed

Detectors flag AI images reliably but fail to name their model
Findings of the Counter Turing Test: AI-Generated Image Detection

Rajarshi Roy +18
cs.LG 2026-05-20 reviewed

Intermediate alignment cuts physics residuals by 66% in diffusion models
Learning to Think in Physics: Breaking Shortcut Learning in Scientific Diffusion via Representation Alignment

Haozhe Jia +8
cs.CV 2026-05-20 reviewed

Attention alignment yields accurate attributes in visual stories
AttriStory: Fine-grained Attribute Realization for Visual Storytelling with Diffusion Models

Manogna Sreenivas +2
cs.CV 2026-05-20 reviewed

Visual token masking flags hallucinations in medical VQA answers
VIHD: Visual Intervention-based Hallucination Detection for Medical Visual Question Answering

Jiayi Chen +5
cs.CV 2026-05-20 reviewed

Diffusion from points creates masks for infrared target detection
Diffuse to Detect: Bi-Level Sample Rebalancing with Pseudo-Label Diffusion for Point-Supervised Infrared Small-Target Detection

Zhu Liu +4
cs.CV 2026-05-20 reviewed

Lightweight U-Net segments spines in CT scans on basic hardware
SpineContextResUNet: A Computationally Efficient Residual UNet for Spine CT Segmentation

K S Nithurshen +1
cs.AI 2026-05-20 reviewed

New guidance resolves gradient conflicts in flow models
Conflict-Aware Additive Guidance for Flow Models under Compositional Rewards

Xuehui Yu +4
cs.CV 2026-05-20 reviewed

Constraint engine turns AI drawings into verifiable geometry reasoning
Draw2Think: Harnessing Geometry Reasoning through Constraint Engine Interaction

Juncheng Hu +3
cs.CV 2026-05-20 reviewed

Scale-decoupled alignment improves remote sensing incremental detection
STAR-IOD: Scale-decoupled Topology Alignment with Pseudo-label Refinement for Remote Sensing Incremental Object Detection

Yaoteng Zhang +3
cs.CV 2026-05-20 reviewed

Language priors fix long-tail bias in 3D point cloud clustering
Resolving Long-Tail Ambiguity in Unsupervised 3D Point Cloud Segmentation with Language Priors

Siqi Wei +6
cs.CV 2026-05-20 reviewed

Open-source iris algorithms pass first official IREX evaluation
Lowering the Barrier to IREX Participation: Open-Source Algorithms, Toolkit, and Benchmarking for Iris Recognition

Siamul Karim Khan +2
cs.CV 2026-05-20 reviewed

Method generates editable 3D surfaces from hand sketches
Sketch2MinSurf: Vision-Language Guided Generation of Editable Minimal Surfaces from Hand-Drawn Sketches

Wenda Wang +6
cs.CV 2026-05-20 reviewed

Attention reweighting suppresses spurious features before CNN pooling
Deep Attention Reweighting: Post-Hoc Attention-Based Feature Aggregation in CNNs for Disentangling Core and Spurious Features under Spurious Correlations

Kin Whye Chew +1
cs.CV 2026-05-20 reviewed

Designer ratings dataset lifts AI graphic scorer to 0.611 agreement
TASTE: A Designer-Annotated Multi-Dimensional Preference Dataset for AI-Generated Graphic Design

Haonan Zhu +4
cs.CV 2026-05-20 reviewed

Early high-frequency injection reduces OOD score overlap
Early High-Frequency Injection for Geometry-Sensitive OOD Detection

Chuanjie Cheng +5
cs.CV 2026-05-20 reviewed

Virtual outliers reshape geometry to handle noisy labels
GAMR: Geometric-Aware Manifold Regularization with Virtual Outlier Synthesis for Learning with Noisy Labels

Ningkang Peng +6
cs.CV 2026-05-20 reviewed

Decoupling reliabilities lifts noisy-label accuracy
Holistic Reliability Propagation: Decoupling Annotation and Prediction for Robust Noisy-Label

Jingyang Mao +2
cs.NE 2026-05-20 reviewed

ReRAM macro reaches 419 TOPS/W for edge neural inference
E-ReCON: An Energy- and Resource-Efficient Precision-Configurable Sparse nvCIM Macro for Conventional and Spiking Neural Edge Inference

Ankit Kumar Tenwar +2
cs.CV 2026-05-20 reviewed

SAVER selectively activates vision to boost F1 and cut latency in multimodal IE
SAVER: Selective As-Needed Vision Evidence for Multimodal Information Extraction

Miaobo Hu +7
cs.CV 2026-05-20 reviewed

DAR cuts DiT training iterations by 8.75x while improving FID by 2.11
Rethinking Cross-Layer Information Routing in Diffusion Transformers

Chao Xu +11
cs.CV 2026-05-20 reviewed

Agent framework hits top zero-shot scores for industrial defect detection
IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools

Rongbin Tan +12
cs.CV 2026-05-20 reviewed

IMU-warped event frames lift action recognition in dark and shaky scenes
DarkShake-DVS: Event-based Human Action Recognition under Low-light andShaking Camera Conditions

Jiaqi Chen +2
cs.CV 2026-05-20 reviewed

VISTAQA benchmark shows models answer but rarely ground correctly
VISTAQA: Benchmarking Joint Visual Question Answering and Pixel-Level Evidence

Mozhgan Nasr Azadani +7
cs.CV 2026-05-20 reviewed

GSA-YOLO hits 189 FPS while cutting compute for X-ray scans
GSA-YOLO: A High-Efficiency Framework via Structured Sparsity and Adaptive Knowledge Distillation for Real-Time X-ray Security Inspection

Jiahao Kong