pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

9568 papers in cs.CV · page 7

  1. cs.CV 2026-05-20 reviewed
    Meta-actions set new SOTA on Waymo driving challenge

    DriveMA: Rethinking Language Interfaces in Driving VLAs with One-Step Meta-Actions

    Weicheng Zheng +4

  2. cs.CV 2026-05-20 reviewed
    One-step meta-actions set new Waymo driving records

    DriveMA: Rethinking Language Interfaces in Driving VLAs with One-Step Meta-Actions

    Weicheng Zheng +4

  3. cs.CV 2026-05-20 reviewed
    105M open image-text pairs train competitive text-to-image model

    MONET: A Massive, Open, Non-redundant and Enriched Text-to-image dataset

    Benjamin Aubin +6

  4. cs.CV 2026-05-20 reviewed
    CNNs suit small land-use data

    Vision Transformers and Convolutional Neural Networks for Land Use Scene Classification

    Arun D. Kulkarni

  5. cs.CV 2026-05-20 reviewed
    Transition vector refines LLM captions for zero-shot image retrieval

    STiTch: Semantic Transition and Transportation in Collaboration for Training-Free Zero-Shot Composed Image Retrieval

    Miaoge Li +5

  6. eess.IV 2026-05-20 reviewed
    Local tolerance rule reconnects gaps in Frangi vessel maps

    Local-sensitive connectivity filter (ls-cf): A post-processing unsupervised improvement of the frangi, hessian and vesselness filters for multimodal vessel segmentation

    Erick O Rodrigues +7

  7. cs.CV 2026-05-20 reviewed
    Dataset trains AI to locate and reduce SR artifacts

    SR-Ground: Image Quality Grounding for Super-Resolved Content

    Artem Borisov +3

  8. cs.CV 2026-05-20 reviewed
    Region-aware VAE completes full heart motion cycle from single frame

    RePCM: Region-Specific and Phenotype-Adaptive Bi-Ventricular Cardiac Motion Synthesis

    Xuan Yang +5

  9. cs.CV 2026-05-20 reviewed
    Peak calibration lifts AI image detector accuracy 12% on new test

    PGC: Peak-Guided Calibration for Generalizable AI-Generated Image Detection

    Xiaoyu Zhou +5

  10. cs.CV 2026-05-20 reviewed
    Co-evolving decoder with policy fixes quality drop in discrete T2I

    RankE: End-to-End Post-Training for Discrete Text-to-Image Generation with Decoder Co-Evolution

    Siyong Jian +7

  11. cs.CV 2026-05-20 reviewed
    NaviEdit separates edit steps from noise scale for better results

    Semantic Granularity Navigation in Image Editing

    Liangsi Lu +3

  12. cs.CV 2026-05-20 reviewed
    SAM3 turns rough maps into sharp bacteria explanations

    SAM-Sode: Towards Faithful Explanations for Tiny Bacteria Detection

    Wanying Tan +9

  13. cs.CL 2026-05-20 reviewed
    Manga109 revised to correct 29,000 dialogue annotations

    Manga109-v2026: Revisiting Manga109 Annotations for Modern Manga Understanding

    Jeonghun Baek +4

  14. cs.CV 2026-05-20 reviewed
    Fully ternary ViT reaches 82.43% accuracy at 6 MB

    FTerViT: Fully Ternary Vision Transformer

    Szymon Ruci\'nski +5

  15. cs.CV 2026-05-20 reviewed
    Weierstrass function supplies 2D patch encodings for vision transformers

    Weierstrass Positional Encoding for Vision Transformers

    Zhihang Xin +3

  16. cs.CV 2026-05-20 reviewed
    YOLOv11 detects military targets in synthetic thermal and night drone images

    Comparative Analysis of Military Detection Using Drone Imagery Across Multiple Visual Spectrums

    Sourov Roy Shuvo +5

  17. cs.CV 2026-05-20 reviewed
    Cognitive-physical RL adds foresight to safer driving policies

    Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving

    Yang Wu +5

  18. cs.CV 2026-05-20 reviewed
    CoPhy RL framework reaches SOTA on NAVSIM with BEV foresight

    Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving

    Yang Wu +5

  19. cs.CV 2026-05-20 reviewed
    Streaming model narrates surgery in real time at three workflow levels

    SurgOnAir: Hierarchy-Aware Real-Time Surgical Video Commentary

    Jingyi He +5

  20. cs.CV 2026-05-20 reviewed
    One transformer switches between real-time and full 3D reconstruction

    UniT: Unified Geometry Learning with Group Autoregressive Transformer

    Haotian Wang +6

  21. cs.CV 2026-05-20 reviewed
    Pairwise comparisons improve video quality assessment generalization

    VersusQ: Pairwise Margin Reasoning for Generalizable Video Quality Assessment

    Shibei Meng +6

  22. cs.CV 2026-05-20 reviewed
    Linear utility improves DPO for diffusion and flow image models

    Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models

    Kesong Li +5

  23. cs.CV 2026-05-20 reviewed
    Router upgrades single-view 3D models to handle any number of views

    ROAR-3D: Routing Arbitrary Views for High-Fidelity 3D Generation

    Hanxiao Sun +7

  24. cs.CV 2026-05-20 reviewed
    Radar tweaks alone match complex camera fusion for 3D detection

    RCGDet3D: Rethinking 4D Radar-Camera Fusion-based 3D Object Detection with Enhanced Radar Feature Encoding

    Weiyi Xiong +1

  25. cs.CV 2026-05-20 reviewed
    Method cuts error in labor-progress angle from ultrasound

    R2AoP: Reliable and Robust Angle of Progression Estimation from Intrapartum Ultrasound

    Yuanhan Wang +9

  26. cs.CV 2026-05-20 reviewed
    3.2M synthetic pairs advance open scene text editing

    TextSculptor: Training and Benchmarking Scene Text Editing

    Yiheng Lin +14

  27. cs.CV 2026-05-20 reviewed
    New model clears banding from phone screen videos

    VDFP: Video Deflickering with Flicker-banding Priors

    Zhiyi Zhou +4

  28. cs.CV 2026-05-20 reviewed
    VDFP removes banding from phone screen videos

    VDFP: Video Deflickering with Flicker-banding Priors

    Zhiyi Zhou +4

  29. cs.CV 2026-05-20 reviewed
    New transformer fuses hyperspectral imagery with other EO sensors

    SpectralEarth-FM: Bringing Hyperspectral Imagery into Multimodal Earth Observation Pretraining

    Nassim Ait Ali Braham +5

  30. cs.CV 2026-05-20 reviewed
    Quantization method enables efficient ARVD video generation

    Q-ARVD: Quantizing Autoregressive Video Diffusion Models

    Siao Tang +4

  31. cs.CV 2026-05-20 reviewed
    0.5B driving model matches 7B models by adding future visual states

    Grounding Driving VLA via Inverse Kinematics

    Junsung Park +1

  32. cs.CV 2026-05-20 reviewed
    Pairwise data trains multimodal LLMs without full joint alignments

    Multimodal LLMs under Pairwise Modalities

    Yan Li +5

  33. cs.CV 2026-05-20 reviewed
    Dynamic allocation speeds video diffusion 7x near-losslessly

    Dynamic Video Generation: Shaping Video Generation Across Time and Space

    Shikang Zheng +7

  34. cs.CV 2026-05-20 reviewed
    Orthogonal projection fixes spatial-temporal ambiguity in 4D driving scenes

    Towards Physically Consistent 4D Scene Reconstruction for Closed-loop Autonomous Driving Simulation

    Bowyn Tan +7

  35. cs.CV 2026-05-20 reviewed
    Dynamic sinks raise dynamic degree in long video generation

    DySink: Dynamic Frame Sinks for Autoregressive Long Video Generation

    Bo Ye +4

  36. cs.CV 2026-05-20 reviewed
    LiteViLNet reaches 96.36% MaxF with 14M parameters at 164 FPS

    LiteViLNet: Lightweight Vision-LiDAR Fusion Network for Efficient Road Segmentation

    Daojie Peng +4

  37. cs.CR 2026-05-20 reviewed
    Framework turns AI detection metrics into legal evidence thresholds

    Verifiable Provenance and Watermarking for Generative AI: An Evidentiary Framework for International Operational Law and Domestic Courts

    Gustav Olaf Yunus Laitinen-Fredriksson Lundstr\"om-Imanov +1

  38. cs.CV 2026-05-20 reviewed
    Body-anchored Gaussians let users reorder clothing layers on 3D avatars

    DAMA: Disentangled Body-Anchored Gaussians for Controllable Multi-Layered Avatars

    Daniel Eskandar +3

  39. cs.CV 2026-05-20 reviewed
    Landsat addition cuts TanDEM-X forest height RMSE by 13.5%

    Hybrid Machine Learning Model for Forest Height Estimation from TanDEM-X and Landsat Data

    Islam Mansour +3

  40. cs.CV 2026-05-20 reviewed
    Contact coupling improves 4D hand-object reconstruction from video

    CHOIR: Contact-aware 4D Hand-Object Interaction Reconstruction

    Hao Xu +4

  41. cs.CV 2026-05-20 reviewed
    Contact signals align hands and objects in monocular 4D videos

    CHOIR: Contact-aware 4D Hand-Object Interaction Reconstruction

    Hao Xu +4

  42. cs.CV 2026-05-20 reviewed
    3D scans integrate rock bolts with fractures for mine assessment

    Towards Integrated Rock Support Visualisation in 3D Point Cloud of Underground Mines

    Dibyayan Patra +4

  43. cs.CV 2026-05-20 reviewed
    VGG16 detects fake images at 91% accuracy

    Comparative Evaluation of Deep Learning Models for Fake Image Detection

    Akhitha Pakala +3

  44. cs.CV 2026-05-20 reviewed
    Layer attention gaps reveal fix for LVLM hallucinations

    Finding the Correct Visual Evidence Without Forgetting: Mitigating Hallucination in LVLMs via Inter-Layer Visual Attention Discrepancy

    Yutong Xie +5

  45. cs.CV 2026-05-20 reviewed
    Multispectral signatures raise small-UAV detection by 6.2 percent

    Towards UAV Detection in the Real World: A New Multispectral Dataset UAVNet-MS and a New Method

    Yihang Luo +15

  46. cs.CV 2026-05-20 reviewed
    Role split improves faithful 4D video editing

    Preserve, Reveal, Expand: Faithful 4D Video Editing with Region-Aware Conditioning

    Zhangchi Hu +7

  47. cs.CV 2026-05-20 reviewed
    Hand drawings add spatial precision to text-based 3D motion generation

    DrawMotion: Generating 3D Human Motions by Freehand Drawing

    Tao Wang +9

  48. cs.CV 2026-05-20 reviewed
    Focus-then-context method trims VLM tokens to 22% with tiny accuracy cost

    Focus-then-Context: Subject-Centric Progressive Visual Token Reduction for Vision-Language Models

    Yulin Zhao +4

  49. cs.CV 2026-05-20 reviewed
    Tiny models master road reasoning from 20-80 graph scenes

    Bridging Structure and Language: Graph-Based Visual Reasoning for Autonomous Road Understanding

    Lena Wild +2

  50. cs.CV 2026-05-20 reviewed
    AI continues paintings by predicting next strokes from canvas history

    PaintCopilot: Modeling Painting as Autonomous Artistic Continuation

    Yunge Wen +2