archive
Every paper Pith has read. Search by title, abstract, or pith.
9568 papers in cs.CV · page 9
-
Reliability map routes experts to cut fusion errors in UAV detection
LER-YOLO: Reliability-Aware Expert Routing for Misaligned RGB-Infrared UAV Detection
-
RoPeSLR cuts DiT FLOPs 10x at 90% sparsity
RoPeSLR: 3D RoPE-driven Sparse-LowRank Attention for Efficient Diffusion Transformers
-
Patch attention cuts vessel breaks in OCTA scans
Gaze into the Details: Locality-Sensitive Enhancement for OCTA Retinal Vessel Segmentation
-
Paired clean videos train model to recognize actions in fog
Seeing Through Fog: Towards Fog-Invariant Action Recognition
-
Training supervision lifts portrait alignment
Pareto-Enhanced Portrait Generation: Vision-Aligned Text Supervision for Alignment, Realism, and Aesthetics
-
Pipeline triples accuracy for Indigenous image captions
Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task
-
Autoregressive diffusion cuts video restoration latency to seconds
Accelerating Video Inverse Problem Solvers with Autoregressive Diffusion Models
-
Animate-inanimate split structures vision MoE experts stably
Beyond Routing: Characterising Expert Tuning and Representation in Vision Mixture-of-Experts
-
Vision model separates content from style to assure landing safety
Mechanistic Interpretability for Learning Assurance of a Vision-Based Landing System
-
-
Failure notes lift diagnostic AI accuracy up to 7%
MedExpMem: Adapting Experience Memory for Differential Diagnosis
-
HeadKV saves memory by budgeting KV cache per attention head
Head-Aware Key-Value Compression for Efficient Autoregressive Image Generation
-
Direct sign-to-sign model beats text cascade on accuracy and speed
Direct Translation between Sign Languages
-
Preference-aligned VLM improves content rating descriptor detection
QwenSafe: Multimodal Content Rating Description Identification via Preference-Aligned VLMs
-
New dataset annotates 73 hours of Colombian bird sounds for AI
A strongly annotated passive acoustic dataset for tropical bird monitoring
-
Dataset annotates 168 tropical bird species in 73 hours of audio
A strongly annotated passive acoustic dataset for tropical bird monitoring
-
Language turns video into simulatable rigid-body configs
$\Delta$ynamics: Language-Based Representation for Inferring Rigid-Body Dynamics From Videos
-
Joint unmixing and localization boosts hyperspectral tracking
End-to-End Unmixing with Material Prompts for Hyperspectral Object Tracking
-
Weighted clusters plus pruning give flexible speed-accuracy control in VPR
Faster or Stronger: Towards Flexible Visual Place Recognition via Weighted Aggregation and Token Pruning
-
Camera distance drives most vision model errors
MAPS: A Synthetic Dataset for Probing Vision Models in a Controlled 3D Scene Space
-
Robotic planners say yes to most impossible commands
The Yes-Man Syndrome: Benchmarking Abstention in Embodied Robotic Agents
-
Uncertainty guides fixes for disconnected vessels in scans
Uncertainty-Guided Conservative Propagation for Structured Inference in Vessel Segmentation
-
Stabilization methods handle joint shifts in continual segmentation
Continual Segmentation under Joint Nonstationarity
-
Dual-stream network classifies breast ultrasounds at 96.58% accuracy
HADS-Net:A Hybrid Attention-Augmented Dual-Stream Network with Physics-Informed Augmentation for Breast Ultrasound Image Classification
-
AI models lag behind text-only on 3D brain MRI benchmark
NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding
5 Piths -
Dataset pairs building models with shade maps for urban heat studies
ShadeBench: A Benchmark Dataset for Building Shade Simulation in Sustainable Society
-
Min-gate fuses diffusion models to catch all four OOD shifts
Tippett-minimum Fusion of Representation-space Diffusion Models for Multi-Encoder Out-of-Distribution Detection
-
fMRI embeddings align across brains via unsupervised rotations
Platonic Representations in the Human Brain: Unsupervised Recovery of Universal Geometry
-
Active selection reaches 100% accuracy with 20 verified images
A Human-in-the-Loop Framework for Efficient Prompt Selection in Microscopy Vision-Language Models
-
A single predictor transfers oracle hyperparameter labels from variational denoisers to…
Oracle Supervision Transfers for Hyperparameter Prediction in Model-Based Image Denoising
-
Tree of anchors bounds drift in long video generation
Goodbye Drift: Anchored Tree Sampling for Long-Horizon Video-to-Video Generation
-
Projection equivariance lifts CBCT-to-CT PSNR by 7 dB
EPC-3D-Diff: Equivariant Physics Consistent Conditional 3D Latent Diffusion for CBCT to CT Synthesis
-
Models hallucinate in 62-82% of chest X-ray reads
HalluCXR: Benchmarking and Mitigating Hallucinations in Medical Vision-Language Models for Chest Radiograph Interpretation
-
Polyp sizing models rely on exam cues
Understanding Model Behavior in Monocular Polyp Sizing
-
Neural bones animate realistic garments at 300+ FPS
HyperBones: Realtime Bone-driven Neural Garment Simulation with Hypernetwork Conditioning
-
Deep learning segments COVID lesions in CT with high accuracy
Pixel Wised Lesion Prediction on COVID-19 CT Imagery: A Comparative Analysis of Automated Image Segmentation Architectures
-
Coupled region growing and ML hits 97-98 percent vessel accuracy
ELEMENT: Multi-Modal Retinal Vessel Segmentation Based on a Coupled Region Growing and Machine Learning Approach
-
VLMs rearrange visible objects at 53-97% but fail occlusion at 6-45%
Do Vision--Language Models Understand 3D Scenes or Just Catalogue Objects?
-
ResNet and VGG hit 95-98 percent accuracy on COVID lung scans
A Comprehensive Comparison of Deep Learning Architectures for COVID-19 Classification on CT & X-ray Imagery
-
The paper introduces a Lighting Convolutional-Attention adapter module that processes RGB…
Lighting-aware Unified Model for Instance Segmentation
-
This paper tests episodic sampling to build class-balanced batches for CT body…
Disentangling Sampling from Training Budget in Class-Imbalanced CT Body Composition Segmentation
-
Bigger 3D models trained on 50M driving scenes top Waymo leaderboard
STELLAR: Scaling 3D Perception Large Models for Autonomous Driving
-
Camera trajectories forecast actions better than language
How You Move Tells What You'll Do: Trajectory-Conditioned Egocentric Prediction
-
Meta-RL extracts rules to segment concepts at any reasoning level
ConceptSeg-R1: Segment Any Concept via Meta-Reinforcement Learning
-
Human videos scale humanoid loco-manipulation without custom rewards
SUGAR: A Scalable Human-Video-Driven Generalizable Humanoid Loco-Manipulation Learning Framework
-
Distortion in latent space guides better sampling for missing modalities
Latent Space Guided Scenario Sampling for Multimodal Segmentation Under Missing Modalities
-
HAPS filters training pairs to boost virtual staining models
HAPS: Rethinking Image Similarity for Virtual Staining
-
Parallel video tools raise long-video benchmarks 7.9%
ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning
-
Parallel tool calls raise long-video scores by 7.9 percent
ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning
-
Foundation models trail supervised ViTs in human interpretability
Capability $\neq$ Interpretability: Human Interpretability of Vision Foundation Models