archive

Every paper Pith has read. Search by title, abstract, or pith.

9568 papers in cs.CV · page 15

cs.IR 2026-05-18 reviewed

Prompting methods raise table QA accuracy without training
Efficient Table QA via TableGrid Navigation and Progressive Inference Prompting

Amritansh Maurya +3
cs.CV 2026-05-18 reviewed

Occupancy-opacity split in Gaussians renders both reflection and transmission
RT-Splatting: Joint Reflection-Transmission Modeling with Gaussian Splatting

Ji Shi +4
cs.CV 2026-05-18 reviewed

Shared codebook bridges modalities without full data pairs
CodeBind: Decoupled Representation Learning for Multimodal Alignment with Unified Compositional Codebook

Zeyu Chen +2
cs.CV 2026-05-18 reviewed

GaussianZoom generates detailed 3D zooms from low-res inputs
GaussianZoom: Progressive Zoom-in Generative 3D Gaussian Splatting with Geometric and Semantic Guidance

Jiale Shi +5
cs.CV 2026-05-18 reviewed

Gaps in real face embeddings hold 10M virtual identities
Non-Colliding Biometric Identities for Digital Entities: Geometry, Capacity, and Million-Scale Virtual Identity Provisioning

Yuyang Ji +4
cs.CV 2026-05-18 reviewed

Noise alignment lets video models output endless coherent clips
Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos

X. Feng +8
cs.SD 2026-05-18 reviewed

Speech audio accelerates MRI reconstruction of vocal tracts
SIREM: Speech-Informed MRI Reconstruction with Learned Sampling

Md Hasan +8
cs.CV 2026-05-18 reviewed

Synthetic egocentric videos raise real interaction model accuracy
EgoInteract: Synthetic Egocentric Videos Generation for Interaction Understanding and Anticipation

Rosario Leonardi +6
cs.CV 2026-05-18 reviewed

Synthetic egocentric videos improve real interaction models
EgoInteract: Synthetic Egocentric Videos Generation for Interaction Understanding and Anticipation

Rosario Leonardi +6
cs.CV 2026-05-18 reviewed

Question routing lifts zero-shot spatial video QA by up to 5%
SPATIOROUTE: Dynamic Prompt Routing for Zero-Shot Spatial Reasoning

Pawat Chunhachatrachai +3
cs.RO 2026-05-18 reviewed

RGB cameras build 3D scene graphs for robots as well as depth sensors
RGB-only Active 3D Scene Graph Generation for Indoor Mobile Robots

Giorgia Modi +3
cs.AI 2026-05-18 reviewed

Sensory-bounded reasoning lifts MLLM accuracy on second-order belief tasks
Beyond the Cartesian Illusion: Testing Two-Stage Multi-Modal Theory of Mind under Perceptual Bottlenecks

Yajing Zhou +1
cs.CV 2026-05-18 reviewed

Best Segmentation Buddies match image pixels to 3D shape parts
Best Segmentation Buddies for Image-Shape Correspondence

Itai Lang +3
cs.CV 2026-05-18 reviewed

View-aware experts lift aerial-ground re-ID mAP by 10 points
View-Aware Semantic Alignment for Aerial-Ground Person Re-Identification

Quan Zhang +6
cs.LG 2026-05-18 reviewed

Heavy-light split cuts diffusion sampling cost by 2-4x
Dual-Rate Diffusion: Accelerating diffusion models with an interleaved heavy-light network

Grigory Bartosh +5
cs.RO 2026-05-18 reviewed

External cameras boost robot scene recall by up to 79%
Fixed External Cameras as Common Prior Maps for Active 3D Scene Graph Generation

Giorgia Modi +3
cs.CV 2026-05-18 reviewed

TokenMask predicts masks directly from token affinities
Token-Space Mask Prediction for Efficient Vision Transformer Segmentation

Calvin Galagain +2
cs.CV 2026-05-18 reviewed

Agentic selector ranks second on four-day multimodal challenge
MARS: Technical Report for the CASTLE Challenge at EgoVis 2026

Haoyu Zhang +6
cs.CV 2026-05-18 reviewed

Soft attention masks enable text spotting without rectification
Do You Need Text Rectification? Soft Attention Mask Embedding for Rectification-Free Scene Text Spotting

Antonio Colombo +1
cs.CV 2026-05-18 reviewed

Consistency reward lifts VLM spatial reasoning
Self-Evolving Spatial Reasoning in Vision Language Models via Geometric Logic Consistency

Junming Liu +6
cs.CV 2026-05-18 reviewed

New module keeps multimodal models true to images during long answers
Vision Inference Former: Sustaining Visual Consistency in Multimodal Large Language Models

Xinpeng Dong +5
cs.CV 2026-05-18 reviewed

Semi-supervised method removes nighttime flares from unlabeled images
Semi-LAR: Semi-supervised Contrastive Learning with Linear Attention for Removal of Nighttime Flares

Xiyu Zhu +3
cs.CV 2026-05-18 reviewed

Joint model fuses reconstruction and causal video generation for driving
Xiaomi Auto World Model: A Joint World Model Integrating Reconstruction and Generation for Autonomous Driving

Lijun Zhou +36
cs.CV 2026-05-18 reviewed

3D generators leave fingerprints that identify their source
Who Generated This 3D Asset? Learning Source Attribution for Generative 3D Models

Sihan Ma +2
cs.CV 2026-05-18 reviewed

Semantic guidance localizes lesions before segmentation and diagnosis
Rad-VLSM: A Cross-Modal Framework with Semantics-Assisted Prompting for Medical Segmentation and Diagnosis

Fengyi Zhang +4
cs.CV 2026-05-18 reviewed

Hybrid tokenizer splits semantics from pixels for better results
WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens

Yiwei Guo +6
cs.CL 2026-05-18 reviewed

Bangla medical questions trip up top AI models
How Good LLMs Are at Answering Bangla Medical Visual Questions? Dataset and Benchmarking

Rafid Ahmed +5
cs.AI 2026-05-18 reviewed

Grounding cuts tokens 18x while matching big models on home tasks
TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning

Zhiyuan Feng +13
cs.CV 2026-05-18 reviewed

Residual fusion refines hands while stabilizing body in video meshes
DanceHMR: Hand-Aware Whole-Body Human Mesh Recovery from Monocular Videos

Wenhao Shen +5
cs.CV 2026-05-18 reviewed

Fusion recovers detailed hands and stable bodies from single videos
DanceHMR: Hand-Aware Whole-Body Human Mesh Recovery from Monocular Videos

Wenhao Shen +5
cs.CV 2026-05-18 reviewed

Diffusion model generates aligned urban energy maps from roads
SENSE: Satellite-based ENergy Synthesis for Sustainable Environment

Kailai Sun +7
cs.CV 2026-05-18 reviewed

Synthetic data cuts mixed-object counting errors by 20%
The MixCount Dataset: Bridging the Data Gap for Open-Vocabulary Object Counting

Corentin Dumery +3
cs.CV 2026-05-18 reviewed

Lightweight ConvNet ensembles match heavy models on Arabic handwriting
Embedded ConvNet Ensembles: A Lightweight Approach to Recognize Arabic Handwritten Characters

Mohsine EL Khayati +2
cs.CV 2026-05-18 reviewed

Pixle attack fools Arabic handwriting AI at 99-100% success
Threats to Arabic Handwriting Recognition: Investigating Black-Box Adversarial Attacks on embedded ConvNet models

Mohsine EL Khayati +3
eess.IV 2026-05-18 reviewed

Triplane features adapt to standard codecs for better volumetric compression
CATRF: Codec-Adaptive TriPlane Radiance Fields for Volumetric Content Delivery

Tung-I Chen +3
cs.CV 2026-05-18 reviewed

Instant3D creates 3D assets in seconds
Efficient 3D Content Reconstruction and Generation

Jiahao Li
cs.CV 2026-05-18 reviewed

Dynamic relevance scores pick pruning regime to shrink omni-modal tokens
OmniSelect: Dynamic Modality-Aware Token Compression for Efficient Omni-modal Large Language Models

Morunliu Yang +8
cs.CV 2026-05-18 reviewed

Template-guided field fuses cues for fast 3D shape matching
SGSoft: Learning Fused Semantic-Geometric Features for 3D Shape Correspondence via Template-Guided Soft Signals

Soyeon Yoon +2
cs.CV 2026-05-18 reviewed

Lateral line patches raise salmon re-ID accuracy across cameras
Patch Ensembles for Robust Salmon Re-Identification with Weak Trajectory Labels

Espen Uri H{\o}gstedt +3
cs.CV 2026-05-18 reviewed

Filtered data beats bigger models in grocery retrieval
What Matters for Grocery Product Retrieval with Open Source Vision Language Models

Emmanuel G. Maminta +1
cs.CV 2026-05-18 reviewed

Dual-stage activation fixes attribute binding in open-vocabulary detection
DSAA: Dual-Stage Attribute Activation for Fine-grained Open Vocabulary Detection

Donghong Jiang +6
cs.CV 2026-05-18 reviewed

Training fixes attention so text alone locates video objects
See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding

Boyuan Sun +4
cs.CV 2026-05-18 reviewed

TinySAM 2 cuts SAM 2 memory tokens to 7 percent at 90 percent accuracy
TinySAM 2: Extreme Memory Compression for Efficient Track Anything Model

Zhaoyuan Ding +3
cs.CV 2026-05-18 reviewed

Semantic scoring refines distilled image datasets
SAS: Semantic-aware Sampling for Generative Dataset Distillation

Mingzhuo Li +6
cs.CV 2026-05-18 reviewed

Graph completion turns non-functional 3D models functional
Functionalization via Structure Completion and Motion Rectification

Mingrui Zhao +10
eess.IV 2026-05-18 reviewed

Inter-frame learning cuts bits for LiDAR geometry
Inter-LPCM: Learning-based Inter-Frame Predictive Coding for LiDAR Point Cloud Compression

Chang Sun +5
cs.LG 2026-05-18 reviewed

Per-module scaling lifts low-bit quantization accuracy
MARR: Module-Adaptive Residual Reconstruction for Low-Bit Post-Training Quantization

Le Su +2
cs.CV 2026-05-18 reviewed

Optical encoder cuts gaze tracking latency to 3.4 ms
Low Latency Gaze Tracking via Latent Optical Sensing

Yidan Zheng +5
eess.IV 2026-05-18 reviewed

Event data fused with frames yields clear silhouettes at kilohertz rates
See Silhouettes in Motion with Neuromorphic Vision

Pei Zhang +4
cs.CV 2026-05-18 reviewed

Decoupled attention balances references in remote sensing super-resolution
Learning to Balance: Decoupled Siamese Diffusion Transformer for Reference-Based Remote Sensing Image Super-Resolution

Bin Luo +6