pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

9568 papers in cs.CV · page 10

  1. cs.GR 2026-05-19 reviewed
    Neural fields guide free Gaussians to capture layered clothing

    PiG-Avatar: Hierarchical Neural-Field-Guided Gaussian Avatars

    Julian Kaltheuner +4

  2. cs.GR 2026-05-19 reviewed
    Free Gaussians in neural space model avatars with complex clothing

    PiG-Avatar: Hierarchical Neural-Field-Guided Gaussian Avatars

    Julian Kaltheuner +4

  3. cs.CV 2026-05-19 reviewed
    LoRA upgrade turns text-to-image flows bidirectional

    FullFlow: Upgrading Text-to-Image Flow Matching Models for Bidirectional Vision--Language Generation

    Eric Tillmann Bill +3

  4. cs.CV 2026-05-19 reviewed
    Benchmark enables reliable testing of multi-shot audio-video models

    MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video Generation

    Yujie Wei +22

  5. cs.CL 2026-05-19 reviewed
    Staged perception training boosts VLM accuracy with shorter reasoning

    From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models

    Juncheng Wu +8

  6. cs.CV 2026-05-19 reviewed
    AUDITS benchmark tests detectors on 530K manipulated images

    Multi-axis Analysis of Image Manipulation Localization

    Keanu Nichols +5

  7. cs.CV 2026-05-19 reviewed
    New test reveals VLMs ignore camera motion in spatial tasks

    CaMo: Camera Motion Grounded Evaluation and Training for Vision-Language Models

    Hsiang-Wei Huang +5

  8. cs.CV 2026-05-19 reviewed
    Prototype layer matches ResNet accuracy on composite X-ray defects

    Interpretable Computer Vision for Defect Detection in X-ray Tomography of Aerospace SiC/SiC Composites

    Antonio Pe\~na Corredor +4

  9. cs.CV 2026-05-19 reviewed
    Counterfactual tests expose failures in LVLM attribution for chest X-rays

    Rethinking Visual Attribution for Chest X-ray Reasoning in Large Vision Language Models

    Guangzhi Xiong +4

  10. cs.CV 2026-05-19 reviewed
    Billion-scale 3D Gaussians train on one 24 GB GPU

    TideGS: Scalable Training of Over One Billion 3D Gaussian Splatting Primitives via Out-of-Core Optimization

    Chonghao Zhong +6

  11. cs.CV 2026-05-19 reviewed
    Dataset lets AI models generate native 100MP images

    PixVerve: Advancing Native UHR Image Generation to 100MP with a Large-Scale High-Quality Dataset

    Haojun Chen +13

  12. cs.CV 2026-05-19 reviewed
    Natural-language concepts replace tokens for multi-target segmentation

    SetCon: Towards Open-Ended Referring Segmentation via Set-Level Concept Prediction

    Zhixiong Zhang +8

  13. cs.CV 2026-05-19 reviewed
    One model handles any-to-any translation across five remote sensing modalities

    MetaEarth-MM: Unified Multimodal Remote Sensing Image Generation with Scene-centered Joint Modeling

    Zhiping Yu +6

  14. cs.CV 2026-05-19 reviewed
    First-frame spatial prompts raise cross-scene trajectory accuracy

    Spatially Prompted Visual Trajectory Prediction for Egocentric Manipulation

    Yifan Li +3

  15. cs.CV 2026-05-19 reviewed
    VLM-guided DPO lifts driving model human alignment by 12%

    VL-DPO: Vision-Language-Guided Finetuning for Preference-Aligned Autonomous Driving

    Zhefan Xu +5

  16. cs.CV 2026-05-19 reviewed
    Adaptive Manifold Guidance conserves probability during strong guidance

    Probability-Conserving Flow Guidance

    Parsa Esmati +4

  17. cs.CV 2026-05-19 reviewed
    Pixel classification hits 95.48% accuracy on angiogram vessels

    X-Ray cardiac angiographic vessel segmentation based on pixel classification using machine learning and region growing

    E O Rodrigues +8

  18. cs.CV 2026-05-19 reviewed
    Small tables bind new visual concepts to word triggers

    Tiny-Engram: Trigger-Indexed Concept Tables for Generative Vision

    Runyuan Cai +3

  19. cs.CV 2026-05-19 reviewed
    Pix2pix network segments heart fat on CT scans with 99% accuracy

    Cardiac fat segmentation using computed tomography and an image-to-image conditional generative adversarial neural network

    Guilherme Santos da Silva +3

  20. cs.CV 2026-05-19 reviewed
    SDM improves adversarial attack performance and efficiency by reconstructing the…

    SDM: A Powerful Tool for Evaluating Model Robustness

    Xinlei Liu +5

  21. cs.CV 2026-05-19 reviewed
    Second opacity per Gaussian cleans up object masks

    OP2GS: Object-Aware 3D Gaussian Splatting with Dual-Opacity Primitives

    Guiyu Liu +4

  22. cs.CV 2026-05-19 reviewed
    Pruning 90% non-text tokens cuts omni-LLM cost by 9x

    Stage-adaptive Token Selection for Efficient Omni-modal LLMs

    Zijie Xin +6

  23. cs.CV 2026-05-19 reviewed
    Nash equilibrium scores filter unstable multimodal reasoning steps

    A Nash Equilibrium Framework For Training-Free Multimodal Step Verification

    Rohit Sinha +5

  24. eess.IV 2026-05-19 reviewed
    Frequency priors guide short-video quality scores

    FGSVQA: Frequency-Guided Short-form Video Quality Assessment

    Xinyi Wang +3

  25. eess.IV 2026-05-19 reviewed
    CryoNet maps debris-covered glaciers at 90 percent IoU

    CryoNet: A Deep Learning Framework for Multi-Modal Debris-Covered Glacier Mapping. A Case Study of the Poiqu Basin, Central Himalaya

    Farzaneh Barzegar +3

  26. cs.CV 2026-05-19 reviewed
    Anime-trained VLM turns sparse sketches into aligned video outputs

    CogOmniControl: Reasoning-Driven Controllable Video Generation via Creative Intent Cognition

    Hongji Yang +6

  27. cs.RO 2026-05-19 reviewed
    Four photodiodes replace cameras for robot odometry

    Minimalist Visual Inertial Odometry

    Francesco Pasti +3

  28. cs.RO 2026-05-19 reviewed
    Visual encoder spatial detail fix unlocks precise robot tasks

    Beyond Binary Success: A Diagnostic Meta-Evaluation Framework for Fine-Grained Manipulation

    He-Yang Xu +7

  29. cs.CV 2026-05-19 reviewed
    Illumination priors guide selective recovery in dark photos

    InterLight: Leveraging Intrinsic Illumination Priors for Low-Light Image Enhancement

    Ziqi Wang +5

  30. cs.CV 2026-05-19 reviewed
    Video transcript grounding reward lifts planning accuracy by 7-16 points

    RECIPE: Procedural Planning via Grounding in Instructional Video

    Luigi Seminara +2

  31. cs.CV 2026-05-19 reviewed
    Fusing lifted panoramas yields long-range navigable 3D worlds from text

    SphericalDreamer: Generating Navigable Immersive 3D Worlds with Panorama Fusion

    Antoine Schnepf +3

  32. cs.CV 2026-05-19 reviewed
    World-ego split lifts long-horizon hybrid robot modeling

    World-Ego Modeling for Long-Horizon Evolution in Hybrid Embodied Tasks

    Zuyao Lin +5

  33. cs.CV 2026-05-19 reviewed
    Refined gradient attention rollout identifies surviving semantic regions to guide…

    Towards Fine-Grained Robustness: Attention-Guided Test-Time Prompt Tuning for Vision-Language Models

    Jia-Wei Hai +2

  34. cs.CV 2026-05-19 reviewed
    VLMs and agents miss over half the score on wild road damage

    WildRoadBench: A Wild Aerial Road-Damage Grounding Benchmark for Vision-Language Models and Autonomous Agents

    Bingnan Liu +9

  35. cs.CV 2026-05-19 reviewed
    Future emotion prediction raises multimodal recognition accuracy

    AffectVerse: Emotional World Models for Multimodal Affective Computing

    Bo Zhao +5

  36. cs.CV 2026-05-19 reviewed
    One pass turns sparse aerial photos into full 3D city models

    Feed-Forward Gaussian Splatting from Sparse Aerial Views

    Dongli Wu +6

  37. cs.CV 2026-05-19 reviewed
    Model fuses lidar and plot data for lower-bias forest biomass maps

    StruMPL: Multi-task Dense Regression under Disjoint Partial Supervision and MNAR Labels

    Reza M. Asiyabi +4

  38. cs.CV 2026-05-19 reviewed
    SplitQ keeps 93.5% accuracy at 3-bit VLM quantization

    Breaking Modality Heterogeneity in Low-Bit Quantization for Large Vision-Language Models

    Yi Zhong +4

  39. cs.CV 2026-05-19 reviewed
    Diversity in memory buffers improves TTA under tight constraints

    GoTTA be Diverse: Rethinking Memory Policies for Test-Time Adaptation

    Shyma Alhuwaider +4

  40. cs.GR 2026-05-19 reviewed
    3D Gaussians replace grids for continuous color mapping

    GLUT: 3D Gaussian Lookup Table for Continuous Color Transformation

    Danna Xue +3

  41. cs.CV 2026-05-19 reviewed
    U-Net feature energy cuts Janus rate in text-to-3D

    Structural Energy Guidance for View-Consistent Text-to-3D Generation

    Qing Zhang +4

  42. cs.CV 2026-05-19 reviewed
    Persona prompts lift construction safety checks by 12 percent

    Passive Construction Site Safety Monitoring via Persona-Scaffolded Adversarial Chain-of-Thought VLM Verification

    Ananth Sriram +2

  43. cs.CV 2026-05-19 reviewed
    New decoder head raises wound segmentation Dice to 81.9%

    WoundFormer: Multi-Scale Spatial Feature Fusion for Multi-Class Wound Tissue Segmentation

    Muhammad Ashad Kabir +1

  44. cs.CV 2026-05-19 reviewed
    Layout priors raise markdown F1 from 0.37 to 0.92 on OOD docs

    Structured Layout Priors for Robust Out-of-Distribution Visual Document Understanding

    Peter El Hachem +4

  45. cs.CV 2026-05-19 reviewed
    Score-based guidance fixes viewpoint estimation in diffusion models

    Landscape-Awareness for Geometric View Diffusion Model

    Yan-Ting Chen +3

  46. cs.CV 2026-05-19 reviewed
    VLMs lag on gaze following and social attention benchmarks

    Eyes on VLM: Benchmarking Gaze Following and Social Gaze Prediction in Vision Language Models

    Hengfei Wang +3

  47. cs.CV 2026-05-19 reviewed
    VLMs trail visual models on gaze following and social attention

    Eyes on VLM: Benchmarking Gaze Following and Social Gaze Prediction in Vision Language Models

    Hengfei Wang +3

  48. cs.CV 2026-05-19 reviewed
    Zero-shot image models fall short on concept faithfulness for XAI

    A Framework for Evaluating Zero-Shot Image Generation in Concept-based Explainability

    Giacomo Astolfi +4

  49. cs.CV 2026-05-19 reviewed
    Dense benchmark exposes open VLMs' gaps on subtle human actions

    FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

    Gueter Josmy Faure +4

  50. cs.CV 2026-05-19 reviewed
    Open VLMs struggle with fine details in human video actions

    FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

    Gueter Josmy Faure +4