pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

9568 papers in cs.CV · page 13

  1. cs.CV 2026-05-19 reviewed
    Optimal transport merges 3DGS primitives down to 10 percent

    MMGS: 10$\times$ Compressed 3DGS through Optimal Transport Aggregation based on Multi-view Ranking

    Beizhen Zhao +4

  2. cs.CV 2026-05-19 reviewed
    Shared subspaces cut parameters 87 percent in continual VLM learning

    iGSP:Implicit Gradient Subspace Projection for Efficient Continual Learning of Vision-Language Models

    Xuezhi Cui +10

  3. cs.CV 2026-05-19 reviewed
    Dense synthetic images boost segmentation accuracy

    What Makes Synthetic Data Effective in Image Segmentation

    Jinjin Zhang +4

  4. cs.CV 2026-05-19 reviewed
    Brain network experts enable competitive fMRI semantic decoding

    FPED: A Functional-Network Prior-Guided Mixture-of-Experts Framework for Interpretable Brain Decoding

    Yudan Ren +4

  5. cs.AI 2026-05-19 reviewed
    Quadtrees cut GUI agent visual tokens by 30 percent

    AQuaUI: Visual Token Reduction for GUI Agents with Adaptive Quadtrees

    Yuankai Li +5

  6. cs.CV 2026-05-19 reviewed
    Flow-map endpoint velocity replaces fake-score network

    Distribution Matching Distillation without Fake Score Network

    Youngjoong Kim +2

  7. cs.CV 2026-05-19 reviewed
    LLM templates expand NAS to discover better architectures

    Structuring Open-Ended NAS: Semi-Automated Design Knowledge Structuring with LLMs for Efficient Neural Architecture Search

    Yuiko Sakuma +6

  8. cs.CV 2026-05-19 reviewed
    Post-training lifts video models' physical consistency

    PhyWorld: Physics-Faithful World Model for Video Generation

    Pu Zhao +12

  9. cs.CV 2026-05-19 reviewed
    Method reduces age bias in medical image classification by decorrelating difficulty

    Robust Mitigation of Age-Dependent Confounding Effects via Sample-Difficulty Decorrelation

    Nikhil Cherian Kurian +4

  10. cs.CV 2026-05-19 reviewed
    HAVEN benchmark aligns video and text across three levels

    HAVEN: Hierarchically Aligned Multimodal Benchmark for Unified Video Understanding

    Mengqi Shi +1

  11. cs.CV 2026-05-19 reviewed
    PCA rotation aligns key channels for accurate VLM pruning

    Rotation-Aligned Key Channel Pruning for Efficient Vision-Language Model Inference

    Beomseok Kang +4

  12. cs.LG 2026-05-19 reviewed
    Regularizer cuts demographic gaps in medical image AI

    Worst-Group Equalized Odds Regularization for Multi-Attribute Fair Medical Image Classification

    Nikhil Cherian Kurian +8

  13. cs.CV 2026-05-19 reviewed
    Smartphone video measures forest trees with ~2 cm accuracy

    Smartphone-based Circular Plot Sampling for Forest Inventory

    Su Sun +4

  14. cs.CV 2026-05-19 reviewed
    Quasi-concavity enforces convex shapes in segmentation networks

    D-Convexity: A Unified Differentiable Convex Shape Prior via Quasi-Concavity for Data-driven Image Segmentation

    Shengzhe Chen +1

  15. cs.CV 2026-05-19 reviewed
    Quantized model cuts brain tumor AI size by 6x with same accuracy

    Quantized Machine Learning Models for Medical Imaging in Low-Resource Healthcare Settings

    Sumanth Meenan Kanneti +1

  16. cs.CV 2026-05-18 reviewed
    Layer-wise compression on image stats yields human-like visual features

    Efficient coding along the visual hierarchy

    Ananya Passi +2

  17. cs.CV 2026-05-18 reviewed
    Freezing image models yields competitive video performance

    Towards Data-Efficient Video Pre-training with Frozen Image Foundation Models

    Svetlana Orlova +2

  18. cs.CV 2026-05-18 reviewed
    SSL pretraining helps models know when to skip DR predictions

    Knowing When Not to Predict: Self Supervised Learning and Abstention for Safer DR Screening

    Muskaan Chopra +3

  19. cs.LG 2026-05-18 reviewed
    VLMs need tight data alignment and miss weak signals in egocentric video

    EgoBabyVLM: Benchmarking Cross-Modal Learning from Naturalistic Egocentric Video Data

    Dongyan Lin +21

  20. cs.CV 2026-05-18 reviewed
    Diffusion model turns uniform organ maps into realistic PET scans

    Generation of Heterogeneous PET Images from Uniform Organ Activity Maps Using a Pretrained Domain-Adapted Diffusion Model

    Suya Li +4

  21. cs.CV 2026-05-18 reviewed
    FAGER metric leads in factual checks for AI image generators

    FAGER: Factually Grounded Evaluation and Refinement of Text-to-Image Models

    Youngsun Lim +3

  22. cs.CV 2026-05-18 reviewed
    CRAFT pipeline leads MAGMaR video QA at 0.739 average

    CRAFT: Critic-Refined Adaptive Key-Frame Targeting for Multimodal Video Question Answering

    Mahesh Bhosale +5

  23. cs.CV 2026-05-18 reviewed
    Multi-horizon training captures longer solar forecast dependencies

    Learning Long-Term Temporal Dependencies in Photovoltaic Power Output Prediction Through Multi-Horizon Forecasting

    Sumit Laha +2

  24. cs.CV 2026-05-18 reviewed
    LiFT lifts 2D generators to coherent 3D medical volumes

    LiFT: Lifted Inter-slice Feature Trajectories for 3D Image Generation from 2D Generators

    Xinhe Zhang +5

  25. cs.RO 2026-05-18 reviewed
    RL fine-tuning aligns traffic simulations with real data

    RLFTSim: Realistic and Controllable Multi-Agent Traffic Simulation via Reinforcement Learning Fine-Tuning

    Ehsan Ahmadi +7

  26. cs.CV 2026-05-18 reviewed
    One photo produces a mask that defeats facial recognition on any image

    Personalized Face Privacy Protection From a Single Image

    Zachary Yahn +7

  27. cs.CV 2026-05-18 reviewed
    New benchmark tests medical AI models on real-world image shifts

    MedFM-Robust: Benchmarking Robustness of Medical Foundation Models

    Xiangxiang Cui +4

  28. cs.CV 2026-05-18 reviewed
    Benchmark tests medical AI models on real-world variations

    MedFM-Robust: Benchmarking Robustness of Medical Foundation Models

    Xiangxiang Cui +4

  29. cs.CV 2026-05-18 reviewed
    Foundation models fail to spot unseen iris attacks and spectral changes

    A Systematic Failure Analysis of Vision Foundation Models for Open Set Iris Presentation Attack Detection

    Rahul Anand +4

  30. cs.CV 2026-05-18 reviewed
    75 real urban walks released with head poses and gaze for trajectory models

    EgoTraj: Real-World Egocentric Human Trajectory Dataset for Multimodal Prediction

    Ahmad Yehia +6

  31. cs.CV 2026-05-18 reviewed
    MLLMs often miss artifacts in AI videos

    Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos

    Yuqi Tang +23

  32. cs.CV 2026-05-18 reviewed
    Self-supervised backbones boost artwork classification

    Harnessing Self-Supervised Features for Art Classification

    Federico Melis +4

  33. cs.CV 2026-05-18 reviewed
    LLM gains part-level and time-step control over human motion

    MotionMERGE: A Multi-granular Framework for Human Motion Editing, Reasoning, Generation, and Explanation

    Bizhu Wu +7

  34. cs.CV 2026-05-18 reviewed
    COLMAP metrics match humans 4x better on 3D view consistency

    Can These Views Be One Scene? Evaluating Multiview 3D Consistency when 3D Foundation Models Hallucinate

    Soumava Paul +2

  35. cs.SD 2026-05-18 reviewed
    Direct waveform audio matches latent methods on benchmarks

    WavFlow: Audio Generation in Waveform Space

    Feiyan Zhou +8

  36. cs.CV 2026-05-18 reviewed
    VLM agent turns vague requests into video edit plans

    Aurora: Unified Video Editing with a Tool-Using Agent

    Yongsheng Yu +6

  37. cs.CV 2026-05-18 reviewed
    Active exploration outperforms passive in spatial intelligence tasks

    ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop

    Yining Hong +7

  38. cs.CV 2026-05-18 reviewed
    Self-distillation from crops boosts MLLM detail recognition

    Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

    Qianhao Yuan +6

  39. cs.CV 2026-05-18 reviewed
    NVFP4 and balanced SP enable 2x faster long video training

    LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

    Yukang Chen +15

  40. cs.CV 2026-05-18 reviewed
    Diffusion models generate faster by growing resolution during denoising

    Spectral Progressive Diffusion for Efficient Image and Video Generation

    Howard Xiao +3

  41. cs.CV 2026-05-18 reviewed
    Diffusion models speed up by growing resolution during denoising

    Spectral Progressive Diffusion for Efficient Image and Video Generation

    Howard Xiao +3

  42. cs.CV 2026-05-18 reviewed
    Single photo gains full PBR lighting control via shared intrinsic maps

    PIXLRelight: Controllable Relighting via Intrinsic Conditioning

    Miguel Farinha +1

  43. cs.CV 2026-05-18 reviewed
    Dual-view selection lifts ego-exo memory accuracy to 58.2 percent

    EgoExoMem: Cross-View Memory Reasoning over Synchronized Egocentric and Exocentric Videos

    Ruiping Liu +9

  44. cs.CV 2026-05-18 reviewed
    Entity ID tracking stops character drift in AI videos

    Advancing Narrative Long Video Generation via Training-Free Identity-Aware Memory

    Jinzhuo Liu +7

  45. cs.RO 2026-05-18 reviewed
    Robots evolve navigation rules from their own successes and failures

    Robo-Cortex: A Self-Evolving Embodied Agent via Dual-Grain Cognitive Memory and Autonomous Knowledge Induction

    Nga Teng Chan +11

  46. cs.CV 2026-05-18 reviewed
    Online steering halves unsafe content in diffusion models

    SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training

    Komal Kumar +5

  47. cs.CV 2026-05-18 reviewed
    Segmentation proxy aligns multimodal understanding and generation

    Semantic Generative Tuning for Unified Multimodal Models

    Songsong Yu +3

  48. cs.CV 2026-05-18 reviewed
    Training augmentations alone match FGIR accuracy without crops

    A Large-Scale Study on the Accuracy vs Cost Trade-offs of Training and Evaluation Settings in Fine-Grained Image Recognition

    Edwin Arkel Rios +7

  49. cs.CV 2026-05-18 reviewed
    3D concept scaffold fixes prompt ambiguity in avatar retrieval

    CMAG: Concept-Scaffolded Retrieval for Marketplace Avatar Generation

    Rajeev Goel +5

  50. cs.CV 2026-05-18 reviewed
    Lance beats prior open models at image and video generation

    Lance: Unified Multimodal Modeling by Multi-Task Synergy

    Fengyi Fu +12