archive

Every paper Pith has read. Search by title, abstract, or pith.

9568 papers in cs.CV · page 12

cs.CV 2026-05-19 reviewed

Spatial weighting and dual loss create novel text-to-image objects
Self-Creative Text-to-Object Generation using Semantic-Aware Spatial Weighting

Yue Yu +4
cs.GR 2026-05-19 reviewed

Sparse anchor fields yield editable SVGs at full raster fidelity
AnchorFlow: Editable SVG Reconstruction via Sparse Anchor Point Fields

Mengnan Jiang +4
cs.CV 2026-05-19 reviewed

Evidential head gives reliable uncertainty for 3D pointmaps
Trust It or Not: Evidential Uncertainty for Feed-Forward 3D Reconstruction with Trust3R

Zihao Zhu +4
cs.CV 2026-05-19 reviewed

RL solver reaches 82.9% on CAPTCHA benchmark
CaptchaMind: Training CAPTCHA Solvers via Reinforcement Learning with Explicit Reasoning Supervision

Pengcheng Wang +7
cs.CV 2026-05-19 reviewed

Replace blocks with synthesized operators to cut training costs
Replacement Learning: Training Neural Networks with Fewer Parameters

Yuming Zhang +7
cs.CV 2026-05-19 reviewed

Early core token attention ranks best seeds for text-to-image results
Boosting Text-to-Image Diffusion Models via Core Token Attention-Based Seed Selection

Yunzhe Zhang +2
cs.CV 2026-05-19 reviewed

The paper describes a framework for 3D localization in multimodal large language models…
Towards Camera-Robust 3D Localization: Equation-Anchored Tool-Use for MLLMs

Xueying Jiang +6
cs.CV 2026-05-19 reviewed

Dual prompts help CLIP identify occluded people better
Dual-Prompt CLIP with Hybrid Visual Encoders for Occluded Person Re-Identification

Zhangjian Ji +3
cs.RO 2026-05-19 reviewed

Negative data cuts collisions in driving AI models
SafeAlign-VLA: A Negative-Enhanced Safe Alignment Framework for Risk-Aware Autonomous Driving

Kefei Tian +4
cs.CL 2026-05-19 reviewed

Merging LLMs into VLMs boosts instructions but not math
Investigating Cross-Modal Skill Injection: Scenarios, Methods, and Hyperparameters

Zhiyu Xu +7
cs.CV 2026-05-19 reviewed

Dual-branch model wins photo quality challenge via explicit differences
iDiff: Interpretable Difference-aware Framework for Pairwise Image Quality Assessment

Xinli Yue +5
cs.GR 2026-05-19 reviewed

Single photo becomes real-time physics video of interacting objects
TelePhysics: Physics-Grounded Multi-Object Scene Generation from a Single Image with Real-Time Interaction

Xin Zhang +7
cs.CV 2026-05-19 reviewed

Text-guided edits keep watermarks intact after decoder-loss training
Are Watermarked Images Editable? SafeMark for Watermark-Preserving Text-Guided Image Editing

Xiaodong Wu +5
cs.CV 2026-05-19 reviewed

Subtraction module lifts unsupervised video domain adaptation
Return of Frustratingly Easy Unsupervised Video Domain Adaptation

Pengfei Wei +4
cs.CV 2026-05-19 reviewed

Event pruning trims 80% tokens but raises reasoning accuracy
EventPrune: Cascaded Event-Assisted Token Pruning for Efficient First-Person Dynamic Spatial Reasoning

Pengtao Ma +9
cs.CV 2026-05-19 reviewed

PathCTM cuts pathology patches by 96 percent
Thinking in Scales: Accelerating Gigapixel Pathology Image Analysis via Adaptive Continuous Reasoning

Jiusong Ge +15
cs.RO 2026-05-19 reviewed

Hybrid platform syncs real CAVs with CARLA-SUMO sims for closed-loop tests
Closed-Loop Hybrid Digital Twin Platform for Connected and Automated Vehicle Validation

Kanglong Quan +6
cs.CV 2026-05-19 reviewed

GUI agents reach only 36% success on media editing tasks
CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing

Haobo Hu +6
cs.CR 2026-05-19 reviewed

Dynamic prompts fuse backdoors with task performance to resist pruning
Exposing Functional Fusion: A New Class of Strategic Backdoor in Dynamic Prompt Architectures

Zeyao Liu +5
cs.CV 2026-05-19 reviewed

Targeted attacks succeed on encoders without knowing the task
Targeted Downstream-Agnostic Attack

Zhuxin Lei +2
cs.LG 2026-05-19 reviewed

CEPO boosts math reasoning to 43.43% at 2B and 60.56% at 4B
CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization

Ahmed Heakl +6
cs.LG 2026-05-19 reviewed

Model fuses layout and netlist to predict cell delay at 0.92% error
FusionCell: Cross-Attentive Fusion of Layout Geometry and Netlist Topology for Standard-Cell Performance Prediction

Haoyi Zhang +4
cs.CV 2026-05-19 reviewed

Prototype-anchored training halves calibration error in place recognition
KappaPlace: Learning Hyperspherical Uncertainty for Visual Place Recognition via Prototype-Anchored Supervision

Maya Yanko +1
cs.CV 2026-05-19 reviewed

Vision agent builds ad-hoc segmentations with working mask
Vision Harnessing Agent for Open Ad-hoc Segmentation

Zilin Wang +1
cs.CV 2026-05-19 reviewed

JUDO outperforms GPT-4o on industrial anomaly QA with normal image references
JUDO: A Juxtaposed Domain-Oriented Multimodal Reasoner for Industrial Anomaly QA

Hyunju Kang +3
cs.CV 2026-05-19 reviewed

Rebalancing attention reduces reference dominance and increases video motion
Rebalancing Reference Frame Dominance to Improve Motion in Image-to-Video Models

Wooseok Jeon +5
cs.CV 2026-05-19 reviewed

Rebalancing attention boosts motion in image-to-video models
Rebalancing Reference Frame Dominance to Improve Motion in Image-to-Video Models

Wooseok Jeon +5
cs.CV 2026-05-19 reviewed

Unlearning methods leave class traces in model representations
Can Vision Models Truly Forget? Mirage: Representation-Level Certification of Visual Unlearning

Zhenyu Yu +4
cs.CV 2026-05-19 reviewed

Variance penalty on penultimate neurons cuts medical AI bias
Neuron Incidence Redistribution for Fairness in Medical Image Classification

Abin Shoby +2
cs.CV 2026-05-19 reviewed

Tracking tokens lift LMM performance on 4D video tasks
LMM-Track4D: Eliciting 4D Dynamic Reasoning in LMMs via Trajectory-Grounded Dialogue

Chaoyue Li +3
cs.CV 2026-05-19 reviewed

Material codebook yields consistent physics parameters from video
MatPhys: Learning Material-Aware Physics Parameters for Deformable Object Simulation from Videos

Yang Yang +3
cs.CV 2026-05-19 reviewed

Concept ontology filters noisy negatives to lift chest X-ray zero-shot tasks
Concept-Guided Noisy Negative Suppression for Zero-Shot Classification and Grounding of Chest X-Ray Findings

Chenyu Lian +3
cs.CV 2026-05-19 reviewed

Heat dissipation flow matching outperforms most baselines
Multi-Scale Generative Modeling with Heat Dissipation Flow Matching

Jun Ma +4
cs.CV 2026-05-19 reviewed

Optical pass checks 15 deepfake videos simultaneously
Scalable, Energy-Efficient Optical-Neural Architecture for Multiplexed Deepfake Video Detection

Parnian Ghapandar Kashani +2
cs.CV 2026-05-19 reviewed

Atlas text boosts mammography BI-RADS accuracy
MAM-CLIP: Vision-Language Pretraining on Mammography Atlases for BI-RADS Classification

Halil Ibrahim Gulluk +1
cs.GR 2026-05-19 reviewed

Repositioned anchors keep motion contacts across body shapes
Skinned Motion Retargeting with Spatially Adaptive Interaction Guidance

Soojin Choi +5
eess.IV 2026-05-19 reviewed

Autoregressive codebook tokens sharpen MRI from extreme undersampling
Next-Acceleration-Scale Prediction for Autoregressive MRI Reconstruction

Yilmaz Korkmaz +1
eess.IV 2026-05-19 reviewed

Autoregressive token prediction sharpens MRI from sparse scans
Next-Acceleration-Scale Prediction for Autoregressive MRI Reconstruction

Yilmaz Korkmaz +1
cs.LG 2026-05-19 reviewed

Claim differences as RL rewards balance caption hallucinations and omissions
ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison

Tianle Li +9
cs.CV 2026-05-19 reviewed

Integral feedback reduces hallucinations in CT medical reports
Regulating Anatomy-Aware Rewards via Trajectory-Integral Feedback for Volumetric Computed Tomography Analysis

Tianwei Lin +9
cs.CV 2026-05-19 reviewed

Two-stage training adds semantics to latent visual reasoning
Semantic-Enriched Latent Visual Reasoning

Tianrun Xu +10
cs.CV 2026-05-19 reviewed

HERA lifts CD-FSS accuracy over 4 mIoU points with tiny updates
Selective, Regularized, and Calibrated: Harnessing Vision Foundation Models for Cross-Domain Few-Shot Semantic Segmentation

Junyuan Ma +4
cs.CV 2026-05-19 reviewed

Event streams improve VLM scene understanding in tough conditions
RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding

Hanqing Liu +5
cs.CV 2026-05-19 reviewed

Event streams lift VLM captioning and VQA scores in low light and motion
RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding

Hanqing Liu +5
cs.CV 2026-05-19 reviewed

DynaTok trims 90% of video tokens with 95% accuracy retained
DynaTok: Temporally Adaptive and Positional Bias-Aware Token Compression for Video-LLMs

Minyoung Park +2
cs.CV 2026-05-19 reviewed

Hierarchical rewards raise text accuracy in image generators
TextAlign: Preference Alignment for Text Rendering with Hierarchical Rewards

Mingxuan Cui +8
cs.CV 2026-05-19 reviewed

Image editing replaces video for robot task planning
SWEET: Sparse World Modeling with Image Editing for Embodied Task Execution

Yiren Song +4
cs.CV 2026-05-19 reviewed

Gated CNN detects falls on smartwatches without attention
You Don't Need Attention: Gated Convolutional Modeling for Watch-Based Fall Detection

Sana Alamgeer +4
cs.CV 2026-05-19 reviewed

Metamorphic relations reveal hidden VQA failures missed by accuracy
MetaRA: Metamorphic Robustness Assessment for Multimodal Large Language Model-based Visual Question Answering Systems

Quanxing Xu +6
cs.GR 2026-05-19 reviewed

Matérn noise gives flow matching triangulation-agnostic behavior
Mat\'ern Noise for Triangulation-Agnostic Flow Matching on Meshes

Tianshu Kuai +3