pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

9568 papers in cs.CV · page 14

  1. cs.CV 2026-05-18 reviewed
    Lance beats prior open models at image and video generation

    Lance: Unified Multimodal Modeling by Multi-Task Synergy

    Fengyi Fu +12

  2. cs.CV 2026-05-18 reviewed
    Fused Earth embeddings beat best single model in four of six tasks

    Better Together: Evaluating the Complementarity of Earth Embedding Models

    Thijs L van der Plas +5

  3. cs.CV 2026-05-18 reviewed
    Learned controller improves long-horizon GUI agents via selective memory

    MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents

    Ziyun Zeng +5

  4. cs.CV 2026-05-18 reviewed
    Geometric primitives recover object joints from casual videos

    Articulation in Prime: Primitive-Based Articulated Object Understanding from a Single Casual Video

    Arslan Artykov +3

  5. cs.CV 2026-05-18 reviewed
    Latent reasoning improves models without appearing at inference

    Leveraging Latent Visual Reasoning in Silence

    Dongyao Zhu +9

  6. cs.CV 2026-05-18 reviewed
    Dual controller reuses plans to cut game agent costs 55%

    SPIKE: An Adaptive Dual Controller Framework for Cost-Efficient Long-Horizon Game Agents

    Wencan Jiang +8

  7. cs.CV 2026-05-18 reviewed
    Cross-view data and explicit alignment advance MLLM spatial reasoning

    CrossView Suite: Harnessing Cross-view Spatial Intelligence of MLLMs with Dataset, Model and Benchmark

    Wei Wang +6

  8. cs.RO 2026-05-18 reviewed
    ManiSoft benchmark tests vision-language control on soft robotic arms

    ManiSoft: Towards Vision-Language Manipulation for Soft Continuum Robotics

    Ziyu Wei +4

  9. cs.CV 2026-05-18 reviewed
    Sign-aware aggregation sustains unlearning across sequential VLM requests

    CATA: Continual Machine Unlearning via Conflict-Averse Task Arithmetic

    Shen Lin +5

  10. cs.CV 2026-05-18 reviewed
    Forward bridging of style proxies stabilizes continual adaptation

    Dance Across Shifts: Forward-Facilitation Continual Test-Time Adaptation through Dynamic Style Bridging

    Zhilin Zhu +5

  11. cs.CV 2026-05-18 reviewed
    Token limits force VLMs to learn active perception

    Starve to Perceive: Taming Lazy Perception in VLMs with Constrained Visual Bandwidth

    Yuhuan Wu +4

  12. cs.CV 2026-05-18 reviewed
    Natural language lets video models control multiple entities at once

    Incantation: Natural Language as the Action Interface for Multi-Entity Video World Models

    Shangwen Zhu +13

  13. cs.CV 2026-05-18 reviewed
    Decoupling tokens fixes spatial bias in novel view synthesis

    Resolving Representation Ambiguity in Feedforward Novel View Synthesis Transformer via Semantic-Spatial Decoupling

    Yihang Wu +6

  14. cs.CV 2026-05-18 reviewed
    Benchmark measures when models should speak in video streams

    OmniPro: A Comprehensive Benchmark for Omni-Proactive Streaming Video Understanding

    Ruixiang Zhao +6

  15. cs.CV 2026-05-18 reviewed
    Quality signals steer flow matching to fix occluded hands in video

    StableHand: Quality-Aware Flow Matching for World-Space Dual-Hand Motion Estimation from Egocentric Video

    Huajian Zeng +5

  16. cs.CV 2026-05-18 reviewed
    Low-rank attention enables hyperspectral models to handle sensor shifts

    LESSViT: Robust Hyperspectral Representation Learning under Spectral Configuration Shift

    Haozhe Si +4

  17. cs.CV 2026-05-18 reviewed
    Color features alone classify cancer at up to 89% accuracy

    Beyond Morphology: Quantifying the Diagnostic Power of Color Features in Cancer Classification

    Farnaz Kheiri +2

  18. cs.CV 2026-05-18 reviewed
    Weak supervision enables better radar scene flow than LiDAR methods

    Weakly Supervised Cross-Modal Learning for 4D Radar Scene Flow Estimation

    Jingyun Fu +2

  19. cs.CV 2026-05-18 reviewed
    2D images and odometry beat LiDAR for radar scene flow

    Weakly Supervised Cross-Modal Learning for 4D Radar Scene Flow Estimation

    Jingyun Fu +2

  20. cs.CV 2026-05-18 reviewed
    Self-distilled MIM leads medical segmentation transfer

    Benchmarking transferability of SSL pretraining to same and different modality segmentation tasks

    Jue Jiang +1

  21. cs.CV 2026-05-18 reviewed
    First end-to-end model jointly edits audio and video from text

    InstructAV2AV: Instruction-Guided Audio-Video Joint Editing

    Haojie Zheng +4

  22. cs.CV 2026-05-18 reviewed
    Speech supervision improves MRI vocal tract segmentation at test time

    Speech-Guided Multimodal Learning for Vocal Tract Segmentation in Real-Time MRI

    Daiqi Liu +13

  23. cs.CV 2026-05-18 reviewed
    Recurrent reasoning adapts CLIP with 6K parameters

    PERL: Parameter Efficient Reasoning in CLIP Latent Space

    Simone Carnemolla +4

  24. cs.CV 2026-05-18 reviewed
    Agent turns top-down room images into executable Blender code

    Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis

    Yixuan Yang +7

  25. cs.CV 2026-05-18 reviewed
    NeRF extensions fix illumination and pose issues for spacecraft models

    NeRF-based Spacecraft Reconstruction from Monocular Imagery Under Illumination Variability and Pose Uncertainty

    Antoine Legrand +2

  26. cs.CV 2026-05-18 reviewed
    Per-image tweaks let NeRF reconstruct spacecraft despite lighting shifts and pose errors

    NeRF-based Spacecraft Reconstruction from Monocular Imagery Under Illumination Variability and Pose Uncertainty

    Antoine Legrand +2

  27. cs.CV 2026-05-18 reviewed
    Accuracy unchanged when latent visual tokens replaced by dummies

    What's Holding Back Latent Visual Reasoning?

    Andr\'e G. Viveiros +3

  28. cs.CV 2026-05-18 reviewed
    1,309-page dataset targets handwritten music recognition

    A Dataset for the Recognition of Historical and Handwritten Music Scores in Western Notation

    Pau Torras +9

  29. cs.IR 2026-05-18 reviewed
    Text guidance focuses full images for cropped-query e-commerce search

    TIGER-FG: Text-Guided Implicit Fine-Grained Grounding for E-commerce Retrieval

    Xinyu Sun +7

  30. cs.CV 2026-05-18 reviewed
    Multi-robot MLLM lifts spatial reasoning accuracy by 7 percent

    Seeing Together: Multi-Robot Cooperative Egocentric Spatial Reasoning with Multimodal Large Language Models

    Kunyu Peng +11

  31. cs.CV 2026-05-18 reviewed
    Geometry-aware coresets lift VLM accuracy in pathology without training

    Geometry-Aware Uncertainty Coresets for Robust Visual In-Context Learning in Histopathology

    Franciskus Xaverius Erick +2

  32. cs.CV 2026-05-18 reviewed
    Infrastructure dataset shows foundation models fall short on defects

    Cracks in the Foundation: A Civil Infrastructure Dataset to Challenge Vision Foundation Models

    Nicola Farronato +8

    4 Piths
  33. cs.CV 2026-05-18 reviewed
    AIS data alone builds graph for global ship arrival forecasts

    Historical Knowledge Graphs for Global Maritime Estimated Time of Arrival

    Neofytos Dimitriou

  34. cs.CG 2026-05-18 reviewed
    Cross-ratios unify across grades in n-dimensional PGA

    Generalize cross-ratios in n-dimensional Plane-Based Geometric Algebra

    Enzo Harquin (LIGM) +4

  35. cs.CV 2026-05-18 reviewed
    Agent planner raises physical accuracy in video models

    NEWTON: Agentic Planning for Physically Grounded Video Generation

    Yuxiang Feng +9

  36. cs.CV 2026-05-18 reviewed
    Frozen vision model serves as generalist image tokenizer

    Vision Foundation Models as Generalist Tokenizers for Image Generation

    Anlin Zheng +7

  37. cs.CV 2026-05-18 reviewed
    Reward makes video generators obey scene geometry

    GeoFlow: Enforcing Implicit Geometric Consistency in Video Generation

    Jan Ackermann +5

  38. cs.CV 2026-05-18 reviewed
    Learned bias in visual attention boosts multimodal models by 3 points

    RAVE: Re-Allocating Visual Attention in Large Multimodal Models

    Xi Leng +6

  39. cs.CV 2026-05-18 reviewed
    Parameter-free attention matches CSRNet accuracy without extra parameters

    Optimising CSRNet with parameter-free attention mechanisms for crowd counting in public transport

    Aida Rostamza +3

  40. cs.CV 2026-05-18 reviewed
    KV selection per frame and head speeds video diffusion 1.48x

    Focused Forcing: Content-Aware Per-Frame KV Selection for Efficient Autoregressive Video Diffusion

    Peiliang Cai +10

  41. cs.CV 2026-05-18 reviewed
    Skew Gaussians cut artifacts in real-time 3D scene views

    3D Skew Gaussian Splatting with Any Camera Trajectory Visualization Engine

    Beizhen Zhao +4

  42. cs.CV 2026-05-18 reviewed
    Deep ensembles calibrate uncertainty better than cross-validation in segmentation

    Lost in the Folds: When Cross-Validation Is Not a Deep Ensemble for Uncertainty Estimation

    Tristan Kirscher (ICube +9

  43. cs.CV 2026-05-18 reviewed
    Deep ensembles calibrate uncertainty better than cross-validation folds

    Lost in the Folds: When Cross-Validation Is Not a Deep Ensemble for Uncertainty Estimation

    Tristan Kirscher (ICube +9

  44. cs.CV 2026-05-18 reviewed
    Separate ViT encoding plus cross-attention improves VP background matting

    CineMatte: Background Matting for Virtual Production and Beyond

    Yuanjian He +3

  45. cs.CV 2026-05-18 reviewed
    RAE v2 reaches SOTA gFID 1.06 in 80 epochs on ImageNet

    Improved Baselines with Representation Autoencoders

    Jaskirat Singh +5

  46. cs.CV 2026-05-18 reviewed
    Wasserstein criterion boosts accuracy of small medical image QA models

    Wasserstein Equilibrium Decoding for Reliable Medical Visual Question Answering

    Luca Hagen +4

  47. cs.LG 2026-05-18 reviewed
    Port-Hamiltonian routing shrinks latent space by 4-8% in world models

    PH-Dreamer: A Physics-Driven World Model via Port-Hamiltonian Generative Dynamics

    Xueyu Luan +1

  48. cs.CV 2026-05-18 reviewed
    Single-pass Hamming loss yields collision-resistant fine-grained hashes

    Collision-Resistant Single-Pass Method for Unsupervised Fine-Grained Image Hashing

    Anh-Kiet Duong +2

  49. cs.CV 2026-05-18 reviewed
    The paper proposes the Information Bottleneck Adapter (IB-Adapter)

    StableVLA: Towards Robust Vision-Language-Action Models without Extra Data

    Yiyang Fu +9

  50. cs.CV 2026-05-18 reviewed
    Semantic compression unlocks exact-likelihood image generation by flows

    SRC-Flow: Compact Semantic Representations Enable Normalizing Flows for Image Generation

    Longtao Jiang +6