archive

Every paper Pith has read. Search by title, abstract, or pith.

9568 papers in cs.CV · page 7

cs.CV 2026-05-20 reviewed

Meta-actions set new SOTA on Waymo driving challenge
DriveMA: Rethinking Language Interfaces in Driving VLAs with One-Step Meta-Actions

Weicheng Zheng +4
cs.CV 2026-05-20 reviewed

One-step meta-actions set new Waymo driving records
DriveMA: Rethinking Language Interfaces in Driving VLAs with One-Step Meta-Actions

Weicheng Zheng +4
cs.CV 2026-05-20 reviewed

105M open image-text pairs train competitive text-to-image model
MONET: A Massive, Open, Non-redundant and Enriched Text-to-image dataset

Benjamin Aubin +6
cs.CV 2026-05-20 reviewed

CNNs suit small land-use data
Vision Transformers and Convolutional Neural Networks for Land Use Scene Classification

Arun D. Kulkarni
cs.CV 2026-05-20 reviewed

Transition vector refines LLM captions for zero-shot image retrieval
STiTch: Semantic Transition and Transportation in Collaboration for Training-Free Zero-Shot Composed Image Retrieval

Miaoge Li +5
eess.IV 2026-05-20 reviewed

Local tolerance rule reconnects gaps in Frangi vessel maps
Local-sensitive connectivity filter (ls-cf): A post-processing unsupervised improvement of the frangi, hessian and vesselness filters for multimodal vessel segmentation

Erick O Rodrigues +7
cs.CV 2026-05-20 reviewed

Dataset trains AI to locate and reduce SR artifacts
SR-Ground: Image Quality Grounding for Super-Resolved Content

Artem Borisov +3
cs.CV 2026-05-20 reviewed

Region-aware VAE completes full heart motion cycle from single frame
RePCM: Region-Specific and Phenotype-Adaptive Bi-Ventricular Cardiac Motion Synthesis

Xuan Yang +5
cs.CV 2026-05-20 reviewed

Peak calibration lifts AI image detector accuracy 12% on new test
PGC: Peak-Guided Calibration for Generalizable AI-Generated Image Detection

Xiaoyu Zhou +5
cs.CV 2026-05-20 reviewed

Co-evolving decoder with policy fixes quality drop in discrete T2I
RankE: End-to-End Post-Training for Discrete Text-to-Image Generation with Decoder Co-Evolution

Siyong Jian +7
cs.CV 2026-05-20 reviewed

NaviEdit separates edit steps from noise scale for better results
Semantic Granularity Navigation in Image Editing

Liangsi Lu +3
cs.CV 2026-05-20 reviewed

SAM3 turns rough maps into sharp bacteria explanations
SAM-Sode: Towards Faithful Explanations for Tiny Bacteria Detection

Wanying Tan +9
cs.CL 2026-05-20 reviewed

Manga109 revised to correct 29,000 dialogue annotations
Manga109-v2026: Revisiting Manga109 Annotations for Modern Manga Understanding

Jeonghun Baek +4
cs.CV 2026-05-20 reviewed

Fully ternary ViT reaches 82.43% accuracy at 6 MB
FTerViT: Fully Ternary Vision Transformer

Szymon Ruci\'nski +5
cs.CV 2026-05-20 reviewed

Weierstrass function supplies 2D patch encodings for vision transformers
Weierstrass Positional Encoding for Vision Transformers

Zhihang Xin +3
cs.CV 2026-05-20 reviewed

YOLOv11 detects military targets in synthetic thermal and night drone images
Comparative Analysis of Military Detection Using Drone Imagery Across Multiple Visual Spectrums

Sourov Roy Shuvo +5
cs.CV 2026-05-20 reviewed

Cognitive-physical RL adds foresight to safer driving policies
Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving

Yang Wu +5
cs.CV 2026-05-20 reviewed

CoPhy RL framework reaches SOTA on NAVSIM with BEV foresight
Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving

Yang Wu +5
cs.CV 2026-05-20 reviewed

Streaming model narrates surgery in real time at three workflow levels
SurgOnAir: Hierarchy-Aware Real-Time Surgical Video Commentary

Jingyi He +5
cs.CV 2026-05-20 reviewed

One transformer switches between real-time and full 3D reconstruction
UniT: Unified Geometry Learning with Group Autoregressive Transformer

Haotian Wang +6
cs.CV 2026-05-20 reviewed

Pairwise comparisons improve video quality assessment generalization
VersusQ: Pairwise Margin Reasoning for Generalizable Video Quality Assessment

Shibei Meng +6
cs.CV 2026-05-20 reviewed

Linear utility improves DPO for diffusion and flow image models
Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models

Kesong Li +5
cs.CV 2026-05-20 reviewed

Router upgrades single-view 3D models to handle any number of views
ROAR-3D: Routing Arbitrary Views for High-Fidelity 3D Generation

Hanxiao Sun +7
cs.CV 2026-05-20 reviewed

Radar tweaks alone match complex camera fusion for 3D detection
RCGDet3D: Rethinking 4D Radar-Camera Fusion-based 3D Object Detection with Enhanced Radar Feature Encoding

Weiyi Xiong +1
cs.CV 2026-05-20 reviewed

Method cuts error in labor-progress angle from ultrasound
R2AoP: Reliable and Robust Angle of Progression Estimation from Intrapartum Ultrasound

Yuanhan Wang +9
cs.CV 2026-05-20 reviewed

3.2M synthetic pairs advance open scene text editing
TextSculptor: Training and Benchmarking Scene Text Editing

Yiheng Lin +14
cs.CV 2026-05-20 reviewed

New model clears banding from phone screen videos
VDFP: Video Deflickering with Flicker-banding Priors

Zhiyi Zhou +4
cs.CV 2026-05-20 reviewed

VDFP removes banding from phone screen videos
VDFP: Video Deflickering with Flicker-banding Priors

Zhiyi Zhou +4
cs.CV 2026-05-20 reviewed

New transformer fuses hyperspectral imagery with other EO sensors
SpectralEarth-FM: Bringing Hyperspectral Imagery into Multimodal Earth Observation Pretraining

Nassim Ait Ali Braham +5
cs.CV 2026-05-20 reviewed

Quantization method enables efficient ARVD video generation
Q-ARVD: Quantizing Autoregressive Video Diffusion Models

Siao Tang +4
cs.CV 2026-05-20 reviewed

0.5B driving model matches 7B models by adding future visual states
Grounding Driving VLA via Inverse Kinematics

Junsung Park +1
cs.CV 2026-05-20 reviewed

Pairwise data trains multimodal LLMs without full joint alignments
Multimodal LLMs under Pairwise Modalities

Yan Li +5
cs.CV 2026-05-20 reviewed

Dynamic allocation speeds video diffusion 7x near-losslessly
Dynamic Video Generation: Shaping Video Generation Across Time and Space

Shikang Zheng +7
cs.CV 2026-05-20 reviewed

Orthogonal projection fixes spatial-temporal ambiguity in 4D driving scenes
Towards Physically Consistent 4D Scene Reconstruction for Closed-loop Autonomous Driving Simulation

Bowyn Tan +7
cs.CV 2026-05-20 reviewed

Dynamic sinks raise dynamic degree in long video generation
DySink: Dynamic Frame Sinks for Autoregressive Long Video Generation

Bo Ye +4
cs.CV 2026-05-20 reviewed

LiteViLNet reaches 96.36% MaxF with 14M parameters at 164 FPS
LiteViLNet: Lightweight Vision-LiDAR Fusion Network for Efficient Road Segmentation

Daojie Peng +4
cs.CR 2026-05-20 reviewed

Framework turns AI detection metrics into legal evidence thresholds
Verifiable Provenance and Watermarking for Generative AI: An Evidentiary Framework for International Operational Law and Domestic Courts

Gustav Olaf Yunus Laitinen-Fredriksson Lundstr\"om-Imanov +1
cs.CV 2026-05-20 reviewed

Body-anchored Gaussians let users reorder clothing layers on 3D avatars
DAMA: Disentangled Body-Anchored Gaussians for Controllable Multi-Layered Avatars

Daniel Eskandar +3
cs.CV 2026-05-20 reviewed

Landsat addition cuts TanDEM-X forest height RMSE by 13.5%
Hybrid Machine Learning Model for Forest Height Estimation from TanDEM-X and Landsat Data

Islam Mansour +3
cs.CV 2026-05-20 reviewed

Contact coupling improves 4D hand-object reconstruction from video
CHOIR: Contact-aware 4D Hand-Object Interaction Reconstruction

Hao Xu +4
cs.CV 2026-05-20 reviewed

Contact signals align hands and objects in monocular 4D videos
CHOIR: Contact-aware 4D Hand-Object Interaction Reconstruction

Hao Xu +4
cs.CV 2026-05-20 reviewed

3D scans integrate rock bolts with fractures for mine assessment
Towards Integrated Rock Support Visualisation in 3D Point Cloud of Underground Mines

Dibyayan Patra +4
cs.CV 2026-05-20 reviewed

VGG16 detects fake images at 91% accuracy
Comparative Evaluation of Deep Learning Models for Fake Image Detection

Akhitha Pakala +3
cs.CV 2026-05-20 reviewed

Layer attention gaps reveal fix for LVLM hallucinations
Finding the Correct Visual Evidence Without Forgetting: Mitigating Hallucination in LVLMs via Inter-Layer Visual Attention Discrepancy

Yutong Xie +5
cs.CV 2026-05-20 reviewed

Multispectral signatures raise small-UAV detection by 6.2 percent
Towards UAV Detection in the Real World: A New Multispectral Dataset UAVNet-MS and a New Method

Yihang Luo +15
cs.CV 2026-05-20 reviewed

Role split improves faithful 4D video editing
Preserve, Reveal, Expand: Faithful 4D Video Editing with Region-Aware Conditioning

Zhangchi Hu +7
cs.CV 2026-05-20 reviewed

Hand drawings add spatial precision to text-based 3D motion generation
DrawMotion: Generating 3D Human Motions by Freehand Drawing

Tao Wang +9
cs.CV 2026-05-20 reviewed

Focus-then-context method trims VLM tokens to 22% with tiny accuracy cost
Focus-then-Context: Subject-Centric Progressive Visual Token Reduction for Vision-Language Models

Yulin Zhao +4
cs.CV 2026-05-20 reviewed

Tiny models master road reasoning from 20-80 graph scenes
Bridging Structure and Language: Graph-Based Visual Reasoning for Autonomous Road Understanding

Lena Wild +2
cs.CV 2026-05-20 reviewed

AI continues paintings by predicting next strokes from canvas history
PaintCopilot: Modeling Painting as Autonomous Artistic Continuation

Yunge Wen +2