pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

9568 papers in cs.CV · page 9

  1. cs.CV 2026-05-20 reviewed
    Reliability map routes experts to cut fusion errors in UAV detection

    LER-YOLO: Reliability-Aware Expert Routing for Misaligned RGB-Infrared UAV Detection

    Liming Hou +8

  2. cs.CV 2026-05-20 reviewed
    RoPeSLR cuts DiT FLOPs 10x at 90% sparsity

    RoPeSLR: 3D RoPE-driven Sparse-LowRank Attention for Efficient Diffusion Transformers

    Yuxi Liu +5

  3. cs.CV 2026-05-20 reviewed
    Patch attention cuts vessel breaks in OCTA scans

    Gaze into the Details: Locality-Sensitive Enhancement for OCTA Retinal Vessel Segmentation

    Tuopusen Huang +2

  4. cs.CV 2026-05-20 reviewed
    Paired clean videos train model to recognize actions in fog

    Seeing Through Fog: Towards Fog-Invariant Action Recognition

    Enqi Liu +4

  5. cs.CV 2026-05-20 reviewed
    Training supervision lifts portrait alignment

    Pareto-Enhanced Portrait Generation: Vision-Aligned Text Supervision for Alignment, Realism, and Aesthetics

    Yunlong Wang +5

  6. cs.CL 2026-05-20 reviewed
    Pipeline triples accuracy for Indigenous image captions

    Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task

    Aashish Dhawan +4

  7. cs.CV 2026-05-20 reviewed
    Autoregressive diffusion cuts video restoration latency to seconds

    Accelerating Video Inverse Problem Solvers with Autoregressive Diffusion Models

    Taesung Kwon +3

  8. cs.CV 2026-05-20 reviewed
    Animate-inanimate split structures vision MoE experts stably

    Beyond Routing: Characterising Expert Tuning and Representation in Vision Mixture-of-Experts

    Gene Tangtartharakul +1

  9. cs.LG 2026-05-20 reviewed
    Vision model separates content from style to assure landing safety

    Mechanistic Interpretability for Learning Assurance of a Vision-Based Landing System

    Romeo Valentin +3

  10. cs.CV 2026-05-20 reviewed
  11. cs.LG 2026-05-20 reviewed
    Failure notes lift diagnostic AI accuracy up to 7%

    MedExpMem: Adapting Experience Memory for Differential Diagnosis

    Qianhan Feng +6

  12. cs.CV 2026-05-20 reviewed
    HeadKV saves memory by budgeting KV cache per attention head

    Head-Aware Key-Value Compression for Efficient Autoregressive Image Generation

    Guotao Liang +3

  13. cs.CL 2026-05-20 reviewed
    Direct sign-to-sign model beats text cascade on accuracy and speed

    Direct Translation between Sign Languages

    Zetian Wu +5

  14. cs.CV 2026-05-20 reviewed
    Preference-aligned VLM improves content rating descriptor detection

    QwenSafe: Multimodal Content Rating Description Identification via Preference-Aligned VLMs

    Dishanika Denipitiyage +2

  15. cs.SD 2026-05-20 reviewed
    New dataset annotates 73 hours of Colombian bird sounds for AI

    A strongly annotated passive acoustic dataset for tropical bird monitoring

    Daniela Ruiz +13

  16. cs.SD 2026-05-20 reviewed
    Dataset annotates 168 tropical bird species in 73 hours of audio

    A strongly annotated passive acoustic dataset for tropical bird monitoring

    Daniela Ruiz +13

  17. cs.CV 2026-05-20 reviewed
    Language turns video into simulatable rigid-body configs

    $\Delta$ynamics: Language-Based Representation for Inferring Rigid-Body Dynamics From Videos

    Chia-Hsiang Kao +7

  18. cs.CV 2026-05-20 reviewed
    Joint unmixing and localization boosts hyperspectral tracking

    End-to-End Unmixing with Material Prompts for Hyperspectral Object Tracking

    Xu Han +7

  19. cs.CV 2026-05-19 reviewed
    Weighted clusters plus pruning give flexible speed-accuracy control in VPR

    Faster or Stronger: Towards Flexible Visual Place Recognition via Weighted Aggregation and Token Pruning

    Zichao Zeng +6

  20. cs.CV 2026-05-19 reviewed
    Camera distance drives most vision model errors

    MAPS: A Synthetic Dataset for Probing Vision Models in a Controlled 3D Scene Space

    Santiago Galella +5

  21. cs.RO 2026-05-19 reviewed
    Robotic planners say yes to most impossible commands

    The Yes-Man Syndrome: Benchmarking Abstention in Embodied Robotic Agents

    Doguhan Yeke +5

  22. cs.CV 2026-05-19 reviewed
    Uncertainty guides fixes for disconnected vessels in scans

    Uncertainty-Guided Conservative Propagation for Structured Inference in Vessel Segmentation

    Huan Huang +2

  23. cs.CV 2026-05-19 reviewed
    Stabilization methods handle joint shifts in continual segmentation

    Continual Segmentation under Joint Nonstationarity

    Prashant Pandey +3

  24. cs.CV 2026-05-19 reviewed
    Dual-stream network classifies breast ultrasounds at 96.58% accuracy

    HADS-Net:A Hybrid Attention-Augmented Dual-Stream Network with Physics-Informed Augmentation for Breast Ultrasound Image Classification

    Chinedu Emmanuel Mbonu +3

  25. cs.CV 2026-05-19 reviewed
    AI models lag behind text-only on 3D brain MRI benchmark

    NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding

    Mohammad H. Abbasi +14

    5 Piths
  26. cs.CV 2026-05-19 reviewed
    Dataset pairs building models with shade maps for urban heat studies

    ShadeBench: A Benchmark Dataset for Building Shade Simulation in Sustainable Society

    Longchao Da +5

  27. cs.LG 2026-05-19 reviewed
    Min-gate fuses diffusion models to catch all four OOD shifts

    Tippett-minimum Fusion of Representation-space Diffusion Models for Multi-Encoder Out-of-Distribution Detection

    Neelkamal Bhuyan

  28. q-bio.NC 2026-05-19 reviewed
    fMRI embeddings align across brains via unsupervised rotations

    Platonic Representations in the Human Brain: Unsupervised Recovery of Universal Geometry

    Pablo Marcos-Manch\'on +2

  29. cs.CV 2026-05-19 reviewed
    Active selection reaches 100% accuracy with 20 verified images

    A Human-in-the-Loop Framework for Efficient Prompt Selection in Microscopy Vision-Language Models

    Abhiram Kandiyana +4

  30. cs.CV 2026-05-19 reviewed
    A single predictor transfers oracle hyperparameter labels from variational denoisers to…

    Oracle Supervision Transfers for Hyperparameter Prediction in Model-Based Image Denoising

    Jianmin Liao +2

  31. cs.CV 2026-05-19 reviewed
    Tree of anchors bounds drift in long video generation

    Goodbye Drift: Anchored Tree Sampling for Long-Horizon Video-to-Video Generation

    Matthew Bendel +4

  32. cs.CV 2026-05-19 reviewed
    Projection equivariance lifts CBCT-to-CT PSNR by 7 dB

    EPC-3D-Diff: Equivariant Physics Consistent Conditional 3D Latent Diffusion for CBCT to CT Synthesis

    Alzahra Altalib +5

  33. cs.CV 2026-05-19 reviewed
    Models hallucinate in 62-82% of chest X-ray reads

    HalluCXR: Benchmarking and Mitigating Hallucinations in Medical Vision-Language Models for Chest Radiograph Interpretation

    Haoyu Wang +1

  34. cs.CV 2026-05-19 reviewed
    Polyp sizing models rely on exam cues

    Understanding Model Behavior in Monocular Polyp Sizing

    Xinqi Xiong +5

  35. cs.GR 2026-05-19 reviewed
    Neural bones animate realistic garments at 300+ FPS

    HyperBones: Realtime Bone-driven Neural Garment Simulation with Hypernetwork Conditioning

    Astitva Srivastava +11

  36. cs.CV 2026-05-19 reviewed
    Deep learning segments COVID lesions in CT with high accuracy

    Pixel Wised Lesion Prediction on COVID-19 CT Imagery: A Comparative Analysis of Automated Image Segmentation Architectures

    Sarmad Khan +3

  37. cs.CV 2026-05-19 reviewed
    Coupled region growing and ML hits 97-98 percent vessel accuracy

    ELEMENT: Multi-Modal Retinal Vessel Segmentation Based on a Coupled Region Growing and Machine Learning Approach

    Erick O. Rodrigues +2

  38. cs.CV 2026-05-19 reviewed
    VLMs rearrange visible objects at 53-97% but fail occlusion at 6-45%

    Do Vision--Language Models Understand 3D Scenes or Just Catalogue Objects?

    Animesh Maheshwari +2

  39. cs.CV 2026-05-19 reviewed
    ResNet and VGG hit 95-98 percent accuracy on COVID lung scans

    A Comprehensive Comparison of Deep Learning Architectures for COVID-19 Classification on CT & X-ray Imagery

    Sarmad Khan +3

  40. cs.CV 2026-05-19 reviewed
    The paper introduces a Lighting Convolutional-Attention adapter module that processes RGB…

    Lighting-aware Unified Model for Instance Segmentation

    Qisai Liu +6

  41. eess.IV 2026-05-19 reviewed
    This paper tests episodic sampling to build class-balanced batches for CT body…

    Disentangling Sampling from Training Budget in Class-Imbalanced CT Body Composition Segmentation

    Iason Skylitsis +2

  42. cs.CV 2026-05-19 reviewed
    Bigger 3D models trained on 50M driving scenes top Waymo leaderboard

    STELLAR: Scaling 3D Perception Large Models for Autonomous Driving

    Yingwei Li +15

  43. cs.CV 2026-05-19 reviewed
    Camera trajectories forecast actions better than language

    How You Move Tells What You'll Do: Trajectory-Conditioned Egocentric Prediction

    Sejoon Jun +3

  44. cs.CV 2026-05-19 reviewed
    Meta-RL extracts rules to segment concepts at any reasoning level

    ConceptSeg-R1: Segment Any Concept via Meta-Reinforcement Learning

    Yuan Zhao +12

  45. cs.RO 2026-05-19 reviewed
    Human videos scale humanoid loco-manipulation without custom rewards

    SUGAR: A Scalable Human-Video-Driven Generalizable Humanoid Loco-Manipulation Learning Framework

    Tianshu Wu +7

  46. cs.CV 2026-05-19 reviewed
    Distortion in latent space guides better sampling for missing modalities

    Latent Space Guided Scenario Sampling for Multimodal Segmentation Under Missing Modalities

    Irem Ulku +2

  47. cs.CV 2026-05-19 reviewed
    HAPS filters training pairs to boost virtual staining models

    HAPS: Rethinking Image Similarity for Virtual Staining

    Fedor Gubanov +8

  48. cs.CV 2026-05-19 reviewed
    Parallel video tools raise long-video benchmarks 7.9%

    ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

    Zuhao Yang +9

  49. cs.CV 2026-05-19 reviewed
    Parallel tool calls raise long-video scores by 7.9 percent

    ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

    Zuhao Yang +9

  50. cs.CV 2026-05-19 reviewed
    Foundation models trail supervised ViTs in human interpretability

    Capability $\neq$ Interpretability: Human Interpretability of Vision Foundation Models

    Julien Colin +3