pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

9568 papers in cs.CV · page 1

  1. cs.CV 2026-05-22 reviewed
    Geometric reward aligns camera paths in generated videos

    Geo-Align: Video Generation Alignment via Metric Geometry Reward

    Zizun Li +4

  2. cs.CV 2026-05-22 reviewed
    Pixel diffusion turns 512x512 latents into 2048x2048 images in 210 ms

    PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

    Yifan Lu +6

  3. cs.CV 2026-05-22 reviewed
    Dedicated image editor lifts multimodal reasoning by 5 points

    ETCHR: Editing To Clarify and Harness Reasoning

    Beichen Zhang +5

  4. cs.CV 2026-05-22 reviewed
    Causal tests show many brain localizations are false positives

    From Activation to Causality: Discovery of Causal Visual Representations in the Human Brain

    Yuval Golbari +7

  5. cs.CV 2026-05-22 reviewed
    Token selection speeds geometry transformers over 85 percent

    Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers

    Shuhong Zheng +5

  6. cs.CV 2026-05-22 reviewed
    Dual-stream system inserts objects into videos harmoniously

    Smart-Insertion-V: Photorealistic Video Insertion via a Closed-Loop Feedback Dual-Stream Framework

    Xiao Cao +9

  7. cs.CV 2026-05-22 reviewed
    HorizonStream keeps 3D reconstruction stable past 10,000 frames

    HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction

    Chong Cheng +11

  8. cs.CV 2026-05-22 reviewed
    Projection conditioning lifts generative priors to scene reconstruction

    GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction

    Katharina Schmid +4

  9. cs.CV 2026-05-22 reviewed
    Geometric overlays on images lift MLLM spatial scores by 20%

    PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs

    Rim Assouel +3

  10. cs.CV 2026-05-22 reviewed
    Self-supervised priors raise physical fidelity in video generators

    LaMo: Self-Supervised Latent Motion Priors for Physical Realism in Video Generation

    Bo Jiang +5

  11. cs.CV 2026-05-22 reviewed
    Entmax attention lifts ViT segmentation mIoU by up to 6 points

    Vision Transformers Need Better Token Interaction

    Linxiang Su

  12. cs.LG 2026-05-22 reviewed
    Foundation models support zero-shot causal image reasoning

    Leveraging Foundation Models for Causal Generative Modeling

    Aneesh Komanduri +1

  13. cs.CV 2026-05-22 reviewed
    Dynamics model learns particle motion from real videos alone

    Learning a Particle Dynamics Model with Real-world Videos

    Chanho Kim +2

  14. cs.CV 2026-05-22 reviewed
    Pretraining on decomposition maps cuts labeled data needs for Mueller polarimetry

    MuellerPT: Decomposition Driven Pretraining for Dense Learning in Mueller Polarimetry

    Adam Tlemsani +6

  15. cs.CV 2026-05-22 reviewed
    LLM splits video queries into tool calls merged by boolean logic

    Decomposing Queries into Tool Calls for Long-Video Keyframe Retrieval

    Michal Shlapentokh-Rothman +3

  16. cs.CV 2026-05-22 reviewed
    Vision models match humans best at balanced generative-discriminative mix

    Not Too Generative, Not Too Discriminative: The Human Alignment Sweet Spot

    Jorge Chang Ortega +3

  17. cs.LG 2026-05-22 reviewed
    Debiased mining converts OOD detection to Monte-Carlo sampling

    Debiased Negative Mining Improves Out-of-distribution Detection with Pre-trained Vision-Language Models

    Bo Peng +3

  18. cs.CV 2026-05-22 reviewed
    Transformer predicts saliency from event camera streams

    Exploring deep learning for Event-Based Saliency Prediction with a Transformer-based model

    Romaric Mazna +2

    4 Piths
  19. cs.CV 2026-05-22 reviewed
    ML framework grades emeralds at 98 percent accuracy

    Machine learning applied to emerald gemstone grading: framework proposal and creation of a public dataset

    FB Pena +4

  20. cs.CV 2026-05-22 reviewed
    cGAN counts eucalyptus logs at 92.3 percent accuracy

    A Novel Approach for the Counting of Wood Logs Using cGANs and Image Processing Techniques

    Jo\~ao VC Mazzochin +6

  21. cs.CV 2026-05-22 reviewed
    Agent beats baselines at text-guided 3D photo search

    PhotoFlow: Agentic 3D Virtual Photography Missions

    Jiarui Guo +7

  22. cs.CV 2026-05-22 reviewed
    Stabilized SegFormer reaches 0.4572 mIoU on original DMS split

    Revitalizing Dense Material Segmentation: Stabilized Vision Transformers and the Generalization Paradox

    Allan Kazakov +3

  23. cs.CV 2026-05-22 reviewed
    Video models fail physics consistency under viewpoint shifts

    CRONOS: Benchmarking Counterfactual Physical Consistency in Video Models

    Le\'on Begiristain +2

  24. cs.CV 2026-05-22 reviewed
    RiGS models multi-scale motions with three Gaussian types

    RiGS: Rigid-aware 4D Gaussian Splatting from a Single Monocular Video

    Chenyu Wu +3

  25. cs.CV 2026-05-22 reviewed
    Coupling narrow models cuts 30% FLOPs from wide vision training

    Recursive Block-Diagonal Coupling for Resource-Efficient Training of Vision Models

    Maxim Henry +3

  26. cs.CV 2026-05-22 reviewed
    Adaptive search fixes blind spots in high-res image perception for LLMs

    CVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception

    Liupeng Li +6

  27. cs.CV 2026-05-22 reviewed
    3D hand motions predict open-surgery skill with r=0.78

    ExpOS: Explainable Open-Surgery Skills Assessment Using 3D Hand Reconstruction

    Roi Papo +2

  28. cs.CV 2026-05-22 reviewed
    Final diagnosis scores hide flawed medical workups in AI

    DDX-TRACE: A Benchmark for Medical Diagnostic Trajectories in VLMs

    Jiazhen Pan +9

  29. cs.CV 2026-05-22 reviewed
    Entity patches in memory fix consistency in multi-shot videos

    EM-Vid: Training-Free Entity-Centric Memory for Efficient and Consistent Multi-Shot Video Generation

    Jente Vandersanden +4

  30. cs.CV 2026-05-22 reviewed
    Semantic banks let 3D splatting handle night glow scenes

    GlowGS: Generative Semantic Feature Learning for 3D Gaussian Splatting in Nighttime Glow Scenes

    Beibei Lin +3

  31. cs.LG 2026-05-22 reviewed
    Meta-learning yields model performance scores on unlabeled data

    Learning to Evaluate: Cost-Effective Model Evaluation on Unlabeled Data with Meta-Learning

    Trinh Pham +4

  32. cs.CV 2026-05-22 reviewed
    Support map shows some regions supply stronger LiDAR-camera cues

    Calibration-Informative Region Selection for Online LiDAR--Camera Calibration in Agricultural Environments

    Rajitha de Silva +1

  33. cs.CV 2026-05-22 reviewed
    PathNavigate scans slides for surprises before matching the question

    PathNavigate: A Training-Free Pathology Agent with Surprise-Guided Scan and Shared Slide Memory for Whole-Slide Image VQA

    Chunze Yang +12

  34. cs.CV 2026-05-22 reviewed
    Tri-module augmentation lifts 3D avatar quality from short videos

    Generator-Refiner-Examiner: A Tri-Module Data Augmentation Framework for 3D Human Avatar Learning from Monocular Videos

    Gangjian Zhang +5

  35. cs.CV 2026-05-22 reviewed
    PixIE raises low-light PSNR by up to 15% using DINO prompts

    PixIE: Prompted Pixel-Space Low-Light Image Enhancement

    Ruirui Lin +3

  36. cs.CV 2026-05-22 reviewed
    Hand motions guide stable object tracking in RGB video

    ComPose: When to Trust Hands for Object Pose Tracking

    Jisu Shin +7

  37. cs.LG 2026-05-22 reviewed
    New sampler cuts RL training time for flow models by up to 53%

    Precise: SDE-Consistent Stochastic Sampling for RL Post-Training of Flow-Matching Models

    Jade Zou +9

  38. cs.CV 2026-05-22 reviewed
    120K triplets enable instruction editing at 4K+ resolution

    VINS-120K: Ultra High-Resolution Image Editing with A Large-Scale Dataset

    Zhizhou Chen +8

  39. cs.GR 2026-05-22 reviewed
    Sketches control long video generation via independent shots

    DrawVideo: Generating Long Video from Storyboard Keyframe Sketches

    Chuanzhi Xu +9

  40. cs.CV 2026-05-22 reviewed
    MDS-DETR gains +2.8 mAP over Deformable-DETR with 5% extra training

    MDS-DETR: DETR with Masked Duplicate Suppressor

    Chanho Lee +3

  41. cs.CV 2026-05-22 reviewed
    Bootstrapped GRTO unifies RL and tool training for segmentation

    B-GRTO: Bootstrapped Group Relative Tool Optimization for Referring Segmentation

    Mario Markov +5

  42. cs.CV 2026-05-22 reviewed
    MDM distills vision-language datasets into compact synthetic sets

    Multimodal Distribution Matching for Vision-Language Dataset Distillation

    Jongoh Jeong +3

  43. cs.CV 2026-05-22 reviewed
    One model forecasts yields for many crops by learning their weather responses

    PhenoYieldNet: Learning Crop-Aware Phenological Responses for Multi-Crop Yield Prediction

    Yu Luo +6

  44. cs.CV 2026-05-22 reviewed
    DINOv3 beats ImageNet after finetuning on RGB inspection but loses on X-ray

    Rethinking Transfer Learning for Industrial Inspection: DINOv3 vs. ImageNet Pretraining Across RGB and X-ray Tasks

    Mehdi Gharbage +3

  45. cs.CV 2026-05-22 reviewed
    One-Forcing scores 83.76 on VBench for one-step video

    One-Forcing: Towards Stable One-Step Autoregressive Video Generation

    Jiaqi Feng +3

  46. cs.CV 2026-05-22 reviewed
    32x compression and linear attention enable fast image restoration

    Efficient One-Step Diffusion Restoration Model with Compact Token Compression and Linear Attention

    Bingtian Qiao +5

  47. cs.LG 2026-05-22 reviewed
    VAE decoder learns to respect non-commutative latent order

    Commutator-Induced Uncertainty in VAEs

    Tahereh Dehdarirad +3

  48. cs.CV 2026-05-22 reviewed
    Dynamic sparse attention delivers 2.1x video generation speedup

    DFSAttn: Dynamic Fine-grained Sparse Attention for Efficient Video Generation

    Jie Hu +3

  49. cs.CV 2026-05-22 reviewed
    Semantic scores trigger early stops in video motion search

    FAST-ME: Foundation-aware Adaptive Stopping for Motion Estimation for Efficient IoT Video Analysis

    Kakia Panagidi +1

  50. cs.LG 2026-05-22 reviewed
    Sample-wise attacks fool TTA while keeping label counts normal

    Sample-wise Targeted Adversarial Attacks on Test-time Adaptation

    Phuc Duc Nguyen +1