pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

9568 papers in cs.CV · page 8

  1. cs.CV 2026-05-20 reviewed
    3D distillation speeds wheat spike volume estimation by 100x

    3D Reconstruction and Knowledge Distillation to Improve Multi-View Image Models to Explore Spike Volume Estimation in Wheat

    Olivia Zumsteg +6

  2. cs.LG 2026-05-20 reviewed
    Oscillatory network scales to ImageNet with high efficiency

    Winfree Oscillatory Neural Network

    Jiawen Dai +1

  3. cs.CV 2026-05-20 reviewed
    RISE makes self-evolving VLMs gain steadily without new labels

    RISE: Reliable Improvement in Self-Evolving Vision-Language Models

    Chaoran Xu +5

  4. cs.CV 2026-05-20 reviewed
    Tweedie matching across overlaps extends short video models to long sequences

    FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching

    Jangho Park +3

  5. cs.CV 2026-05-20 reviewed
    Hybrid routes inputs to concept or neural branch for accuracy gains

    SynCB: A Synergy Concept-Based Model with Dynamic Routing Between Concepts and Complementary Neural Branches

    Tores Julie +5

  6. cs.CV 2026-05-20 reviewed
    Frozen video model plus probe wins kitchen action challenge

    JFAA: Technical Report for the EPIC-KITCHENS-100 Action Anticipation Challenge at EgoVis 2026

    Qiaohui Chu +6

  7. cs.CV 2026-05-20 reviewed
    VISTA wins Ego4D STA challenge by fusing frozen video features into detector

    VISTA: Technical Report for the Ego4D Short-Term Object Interaction Anticipation at EgoVis 2026

    Qiaohui Chu +6

  8. cs.CV 2026-05-20 reviewed
    MLLM arbitration with ensemble reaches 70.49% on 306 fruits

    FruitEnsemble: MLLM-Guided Arbitration for Heterogeneous ensemble in Fine-Grained Fruit Recognition

    Enhui Yu +4

  9. cs.CV 2026-05-20 reviewed
    Two-level experts reduce redundancy in multimodal cancer survival models

    HDMoE: A Hierarchical Decoupling-Fusion Mixture-of-Experts Framework for Multimodal Cancer Survival Prediction

    Huayi Wang +7

  10. cs.CV 2026-05-20 reviewed
    Map anchors egocentric pose to eliminate drift

    Map-Mono-Ego: Map-Grounded Global Human Pose Estimation from Monocular Egocentric Video

    Hiroyuki Deguchi +5

  11. cs.MA 2026-05-20 reviewed
    Self-elicited reasoning and critic revision improve sarcasm detection

    ProCrit: Self-Elicited Multi-Perspective Reasoning with Critic-Guided Revision for Multimodal Sarcasm Detection

    Yingjia Xu +5

  12. cs.CV 2026-05-20 reviewed
    Polynomial alternatives match activation-based vision models

    Activation-Free Backbones for Image Recognition: Polynomial Alternatives within MetaFormer-Style Vision Models

    Jeffrey Wang +2

  13. cs.CV 2026-05-20 reviewed
    224K short videos collected by labels support semantic benchmarks

    USV: Towards Understanding the User-generated Short-form Videos

    Haoyue Cheng +5

  14. cs.CV 2026-05-20 reviewed
    New benchmark shows VLMs lag trained humans on building layouts

    ArchSIBench: Benchmarking the Architectural Spatial Intelligence of Vision-Language Models

    Qirui Shen +7

  15. cs.CV 2026-05-20 reviewed
    Two-stage model turns panoramic X-rays into accurate 3D dental volumes

    HyDAR-Pano3D: A Hybrid Disentangled Anatomical Recovery Framework for Panoramic-to-3D Reconstruction

    Yaoyao Yue +4

  16. cs.CV 2026-05-20 reviewed
    Witness cues turn missing 3D relations into usable training signals

    RelWitness: Open-Vocabulary 3D Scene Graph Generation with Visual-Geometric Relation Witnesses

    Minh Anh Nguyen +4

  17. cs.CV 2026-05-20 reviewed
    Visual-geometric cues recover missing 3D relations from incomplete labels

    RelWitness: Open-Vocabulary 3D Scene Graph Generation with Visual-Geometric Relation Witnesses

    Minh Anh Nguyen +4

  18. cs.CV 2026-05-20 reviewed
    TERDNet beats prior models at spotting scene changes

    TERDNet: Transformer Encoder-Recurrent Decoder Network for Scene Change Detection

    Jiae Yoon +1

  19. cs.CV 2026-05-20 reviewed
    Patch alignment spots changes in free-motion videos

    VSCD: Video-based Scene Change Detection in Unaligned Scenes

    Jiae Yoon +1

  20. cs.CV 2026-05-20 reviewed
    Single network pass reconstructs images with 2D Gaussians in 160-300 ms

    AIR: Amortized Image Reconstruction Framework for Self-Supervised Feed-Forward 2D Gaussian Splatting

    Zhaojie Zeng +3

  21. cs.CV 2026-05-20 reviewed
    Reranking OSGNet candidates with MLLM wins Ego4D challenge

    OSGNet with MLLM Reranking @ Ego4D Episodic Memory Challenge 2026

    Yisen Feng +7

  22. cs.CV 2026-05-20 reviewed
    Self-similarity alignment fixes high-res diffusion conflicts

    Spatial Gram Alignment for Ultra-High-Resolution Image Synthesis

    Jinjin Zhang +2

  23. cs.CV 2026-05-20 reviewed
    Canny map first keeps logos and text intact in subject edits

    Decomposing Subject-Driven Image Generation via Intermediate Structural Prediction

    Hanzhong Guo +1

  24. cs.CV 2026-05-20 reviewed
    OlmoEarth models cut training GPU hours by 1.7x

    OlmoEarth v1.1: A more efficient family of OlmoEarth models

    Gabriel Tseng +9

  25. cs.CV 2026-05-20 reviewed
    Connector degrades structural semantics in video editing

    What Semantics Survive the Connector? Diagnosing VLM-to-DiT Alignment in Video Editing

    Hangyu Lin +6

  26. cs.CV 2026-05-20 reviewed
    AI detectors flag fakes well but cannot identify the source model

    Findings of the Counter Turing Test: AI-Generated Image Detection

    Rajarshi Roy +18

  27. cs.CV 2026-05-20 reviewed
    Detectors flag AI images reliably but fail to name their model

    Findings of the Counter Turing Test: AI-Generated Image Detection

    Rajarshi Roy +18

  28. cs.LG 2026-05-20 reviewed
    Intermediate alignment cuts physics residuals by 66% in diffusion models

    Learning to Think in Physics: Breaking Shortcut Learning in Scientific Diffusion via Representation Alignment

    Haozhe Jia +8

  29. cs.CV 2026-05-20 reviewed
    Attention alignment yields accurate attributes in visual stories

    AttriStory: Fine-grained Attribute Realization for Visual Storytelling with Diffusion Models

    Manogna Sreenivas +2

  30. cs.CV 2026-05-20 reviewed
    Visual token masking flags hallucinations in medical VQA answers

    VIHD: Visual Intervention-based Hallucination Detection for Medical Visual Question Answering

    Jiayi Chen +5

  31. cs.CV 2026-05-20 reviewed
    Diffusion from points creates masks for infrared target detection

    Diffuse to Detect: Bi-Level Sample Rebalancing with Pseudo-Label Diffusion for Point-Supervised Infrared Small-Target Detection

    Zhu Liu +4

  32. cs.CV 2026-05-20 reviewed
    Lightweight U-Net segments spines in CT scans on basic hardware

    SpineContextResUNet: A Computationally Efficient Residual UNet for Spine CT Segmentation

    K S Nithurshen +1

  33. cs.AI 2026-05-20 reviewed
    New guidance resolves gradient conflicts in flow models

    Conflict-Aware Additive Guidance for Flow Models under Compositional Rewards

    Xuehui Yu +4

  34. cs.CV 2026-05-20 reviewed
    Constraint engine turns AI drawings into verifiable geometry reasoning

    Draw2Think: Harnessing Geometry Reasoning through Constraint Engine Interaction

    Juncheng Hu +3

  35. cs.CV 2026-05-20 reviewed
    Scale-decoupled alignment improves remote sensing incremental detection

    STAR-IOD: Scale-decoupled Topology Alignment with Pseudo-label Refinement for Remote Sensing Incremental Object Detection

    Yaoteng Zhang +3

  36. cs.CV 2026-05-20 reviewed
    Language priors fix long-tail bias in 3D point cloud clustering

    Resolving Long-Tail Ambiguity in Unsupervised 3D Point Cloud Segmentation with Language Priors

    Siqi Wei +6

  37. cs.CV 2026-05-20 reviewed
    Open-source iris algorithms pass first official IREX evaluation

    Lowering the Barrier to IREX Participation: Open-Source Algorithms, Toolkit, and Benchmarking for Iris Recognition

    Siamul Karim Khan +2

  38. cs.CV 2026-05-20 reviewed
    Method generates editable 3D surfaces from hand sketches

    Sketch2MinSurf: Vision-Language Guided Generation of Editable Minimal Surfaces from Hand-Drawn Sketches

    Wenda Wang +6

  39. cs.CV 2026-05-20 reviewed
    Attention reweighting suppresses spurious features before CNN pooling

    Deep Attention Reweighting: Post-Hoc Attention-Based Feature Aggregation in CNNs for Disentangling Core and Spurious Features under Spurious Correlations

    Kin Whye Chew +1

  40. cs.CV 2026-05-20 reviewed
    Designer ratings dataset lifts AI graphic scorer to 0.611 agreement

    TASTE: A Designer-Annotated Multi-Dimensional Preference Dataset for AI-Generated Graphic Design

    Haonan Zhu +4

  41. cs.CV 2026-05-20 reviewed
    Early high-frequency injection reduces OOD score overlap

    Early High-Frequency Injection for Geometry-Sensitive OOD Detection

    Chuanjie Cheng +5

  42. cs.CV 2026-05-20 reviewed
    Virtual outliers reshape geometry to handle noisy labels

    GAMR: Geometric-Aware Manifold Regularization with Virtual Outlier Synthesis for Learning with Noisy Labels

    Ningkang Peng +6

  43. cs.CV 2026-05-20 reviewed
    Decoupling reliabilities lifts noisy-label accuracy

    Holistic Reliability Propagation: Decoupling Annotation and Prediction for Robust Noisy-Label

    Jingyang Mao +2

  44. cs.NE 2026-05-20 reviewed
    ReRAM macro reaches 419 TOPS/W for edge neural inference

    E-ReCON: An Energy- and Resource-Efficient Precision-Configurable Sparse nvCIM Macro for Conventional and Spiking Neural Edge Inference

    Ankit Kumar Tenwar +2

  45. cs.CV 2026-05-20 reviewed
    SAVER selectively activates vision to boost F1 and cut latency in multimodal IE

    SAVER: Selective As-Needed Vision Evidence for Multimodal Information Extraction

    Miaobo Hu +7

  46. cs.CV 2026-05-20 reviewed
    DAR cuts DiT training iterations by 8.75x while improving FID by 2.11

    Rethinking Cross-Layer Information Routing in Diffusion Transformers

    Chao Xu +11

  47. cs.CV 2026-05-20 reviewed
    Agent framework hits top zero-shot scores for industrial defect detection

    IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools

    Rongbin Tan +12

  48. cs.CV 2026-05-20 reviewed
    IMU-warped event frames lift action recognition in dark and shaky scenes

    DarkShake-DVS: Event-based Human Action Recognition under Low-light andShaking Camera Conditions

    Jiaqi Chen +2

  49. cs.CV 2026-05-20 reviewed
    VISTAQA benchmark shows models answer but rarely ground correctly

    VISTAQA: Benchmarking Joint Visual Question Answering and Pixel-Level Evidence

    Mozhgan Nasr Azadani +7

  50. cs.CV 2026-05-20 reviewed
    GSA-YOLO hits 189 FPS while cutting compute for X-ray scans

    GSA-YOLO: A High-Efficiency Framework via Structured Sparsity and Adaptive Knowledge Distillation for Real-Time X-ray Security Inspection

    Jiahao Kong