pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

9568 papers in cs.CV · page 6

  1. cs.CV 2026-05-21 reviewed
    DoRA raises VLA success rates by 10.4 points over SFT

    CrossVLA: Cross-Paradigm Post-Training and Inference Optimization for Vision-Language-Action Models

    Zhi Liu

  2. cs.CV 2026-05-21 reviewed
    Seizure video dataset yields 0.96 F1 on epileptic classification

    Seizure-Semiology-Suite (S3): A Clinically Multimodal Dataset, Benchmark, and Models for Seizure Semiology Understanding

    Lina Zhang +25

  3. eess.IV 2026-05-20 reviewed
    PET/CT model matches full segmentation accuracy with 10% labels

    An Open Multi-Center Whole-Body FDG PET/CT Foundation Model for Tumor Segmentation

    Xiaofeng Liu +6

  4. eess.IV 2026-05-20 reviewed
    Embeddings support 99% accurate tomato field mapping

    Mapping Tomato Cropping Systems in California Using AlphaEarth Geospatial Embeddings and Deep Learning Analysis

    Mohammadreza Narimani +2

  5. cs.CV 2026-05-20 reviewed
    Context rewrite lifts 3D grounding accuracy by up to 22 points

    MM-Conv: A Multimodal Dataset and Benchmark for Context-Aware Grounding in 3D Dialogue

    Anna Deichler +6

    3 Piths
  6. cs.CV 2026-05-20 reviewed
    Scene graph matching grounds 3D objects from language without training

    SceneGraphGrounder: Zero-Shot 3D Visual Grounding via Structured Scene Graph Matching

    Xuefei Sun +4

  7. cs.CV 2026-05-20 reviewed
    Diffusion model relights full-body videos consistently under new lights

    BodyReLux: Temporally Consistent Full-Body Video Relighting

    Li Ma +6

  8. cs.CV 2026-05-20 reviewed
    4D geometry supervision lifts robot video models to 81% success

    GEM-4D: Geometry-Enhanced Video World Models for Robot Manipulation

    Kaichen Zhou +10

  9. cs.CV 2026-05-20 reviewed
    VLMs give better 3D vehicle dimensions than lidar in occluded cases

    Improving 3D Labeling in Self-Driving by Inferring Vehicle Information using Vision Language Models

    Steven Chen +2

  10. cs.CV 2026-05-20 reviewed
    Lightweight cross-encoder matches LLM judges for caption evaluation

    BEiTScore: Reference-free Image Captioning Evaluation with an Efficient Cross-Encoder Model

    Gon\c{c}alo Gomes +2

  11. cs.CV 2026-05-20 reviewed
    Vision-IMU attention fusion cuts hand tracking error by 16%

    AVI-HT: Adaptive Vision-IMU Fusion for 3D Hand Tracking

    Ziyi Kou +6

  12. eess.IV 2026-05-20 reviewed
    HSR methods vary by over 13 dB across degradation types

    HyperBench: Standardizing and Scaling Synthetic Evaluation for Hyperspectral Super-Resolution

    Ritik Shah +1

  13. cs.CV 2026-05-20 reviewed
    AI turns T1 scans into motion-free high-res MRIs

    MRecover: A Conditional Generative Model for Recovering Motion-Corrupted MR images Using AI Generated Contrast

    Jinghang Li +15

  14. cs.LG 2026-05-20 reviewed
    Stochastic policy amortizes diffusion guidance for 5x faster sampling

    Hierarchical Variational Policies for Reward-Guided Diffusion

    Kushagra Pandey +4

  15. cs.CV 2026-05-20 reviewed
    Ultrasound VQA model learns to zoom closer before diagnosing

    Look-Closer-Then-Diagnose: Confidence-Aware Ultrasound VQA via Active Zooming

    Yue Zhou +7

  16. cs.CV 2026-05-20 reviewed
    VLMs retain gains after corrupting thought tokens

    Ablate-to-Validate: Are Vision-Language Models Really Using Continuous Thought Tokens?

    Tianyi Zhang +2

  17. eess.IV 2026-05-20 reviewed
    Three-plane aggregation raises stroke lesion Dice score

    VRXU-net: A Deep Learning Approach for Brain Ischemic Stroke Lesion Detection and Segmentation in T1W MRI

    Sayed Amir Mousavi Mobarakeh

  18. cs.CV 2026-05-20 reviewed
    New benchmark shows LVLMs falter on furniture assembly videos

    Flat-Pack Bench: Evaluating Spatio-Temporal Understanding in Large Vision-Language Models through Furniture Assembly

    Aditya Chetan +7

  19. cs.CV 2026-05-20 reviewed
    Text rendered on masks improves images and halves inference cost

    UniVL: Unified Vision-Language Embedding for Spatially Grounded Contextual Image Generation

    Jiayun Wang +4

  20. cs.CV 2026-05-20 reviewed
    Agents evolve image generation by distilling trajectory differences

    GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

    Sixiang Chen +9

  21. cs.CV 2026-05-20 reviewed
    Agents evolve image generation by distilling trajectory differences

    GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

    Sixiang Chen +9

  22. cs.CV 2026-05-20 reviewed
    3.8B model rivals larger ones using 19% of the training compute

    Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models

    Dong Chen +20

  23. cs.LG 2026-05-20 reviewed
    Amortized noise sampling cuts diffusion teacher variance 10x

    Variance Reduction for Expectations with Diffusion Teachers

    Jesse Bettencourt +4

  24. cs.LG 2026-05-20 reviewed
    Amortized resampling yields 2-3x compute gains for diffusion teachers

    Variance Reduction for Expectations with Diffusion Teachers

    Jesse Bettencourt +4

  25. cs.CV 2026-05-20 reviewed
    Single editing task lifts understanding

    Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning

    Dian Zheng +6

  26. cs.CV 2026-05-20 reviewed
    One editing task improves understanding

    Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning

    Dian Zheng +6

  27. cs.CV 2026-05-20 reviewed
    Fixed-point distillation matches multi-step diffusion in one pass

    One-Step Distillation of Discrete Diffusion Image Generators via Fixed-Point Iteration

    Chaoyang Wang +1

  28. cs.CV 2026-05-20 reviewed
    Unified model generates simulation-ready 3D assets across object types

    PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects

    Ziang Cao +7

  29. cs.CV 2026-05-20 reviewed
    WikiVQABench tests VLMs on Wikipedia questions needing external knowledge

    WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata

    Basel Shbita +2

  30. cs.CV 2026-05-20 reviewed
    Latent dynamics model yields coherent full-body avatar animations

    Latent Dynamics for Full Body Avatar Animation

    Shichong Peng +9

  31. cs.CV 2026-05-20 reviewed
    Evidential memory turns frozen 3D generators into streaming systems

    Stream3D: Sequential Multi-View 3D Generation via Evidential Memory

    Kaichen Zhou +5

  32. cs.CV 2026-05-20 reviewed
    Few-step streaming adapts generators for video editing without training

    StreamGVE: Training-Free Video Editing via Few-Step Streaming Video Generation

    Guanlong Jiao +4

  33. cs.CV 2026-05-20 reviewed
    Prototypes and pathways fuse for cancer survival prediction with built-in explanations

    ProtoPathway: Biologically Structured Prototype-Pathway Fusion for Multimodal Cancer Survival Prediction

    Amaya Gallagher-Syed +4

  34. cs.CV 2026-05-20 reviewed
    VLMs miss most time-based glitches in game videos

    TempGlitch: Evaluating Vision-Language Models for Temporal Glitch Detection in Gameplay Videos

    Yakun Yu +6

  35. cs.CV 2026-05-20 reviewed
    Two-frame recurrent method restores turbulence videos efficiently

    ReMATF: Recurrent Motion-Adaptive Multi-scale Turbulence Mitigation for Dynamic Scenes

    Zhiming Liu +2

  36. cs.CV 2026-05-20 reviewed
    New method masters interactive video try-on with hand and action guidance

    iTryOn: Mastering Interactive Video Virtual Try-On with Spatial-Semantic Guidance

    Jun Zheng +8

  37. cs.CV 2026-05-20 reviewed
    Smartphone runs full gait analysis locally without cloud upload

    AIGaitor: Privacy-preserving and cloud-free motion analysis for everyone, using edge computing

    Lauhitya Reddy +2

  38. cs.LG 2026-05-20 reviewed
    Gossip-based critic sharing lifts multi-cell OFDMA sum-rates in 6G

    FedCritic: Serverless Federated Critic Learning-based Resource Allocation for Multi-Cell OFDMA in 6G

    Amin Farajzadeh +1

  39. cs.CV 2026-05-20 reviewed
    Top-n encoder selection lifts blended emotion accuracy

    Ordering Matters: Rank-Aware Selective Fusion for Blended Emotion Recognition

    Junghyun Lee +3

  40. cs.RO 2026-05-20 reviewed
    3D point clouds lift VLA robot success by 10%

    PointACT: Vision-Language-Action Models with Multi-Scale Point-Action Interaction

    Shizhe Chen +2

  41. cs.CV 2026-05-20 reviewed
    Road videos now produce captions with chosen tone

    RoadTones: Tone Controllable Text Generation from Road Event Videos

    Chirag Parikh +2

  42. cs.CV 2026-05-20 reviewed
    One model shifts image restoration from precise to creative

    Disentangling Generation and Regression in Stochastic Interpolants for Controllable Image Restoration

    Yi Liu +5

  43. cs.CV 2026-05-20 reviewed
    Simulation feedback picks best synthetic scenes for driving models

    Closed Loop Dynamic Driving Data Mixture for Real-Synthetic Co-Training

    Hongzhi Ruan +7

  44. cs.CV 2026-05-20 reviewed
    Diffusion model fills Antarctic Landsat gaps without references

    A Non-Reference Diffusion-Based Restoration Framework for Landsat 7 ETM+ SLC-off Imagery in Antarctica

    Leyue Tang +3

  45. cs.CV 2026-05-20 reviewed
    Model fixes occlusion order in overlapping layout-to-image scenes

    OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation

    Ziye Li +1

  46. cs.CV 2026-05-20 reviewed
    Hyper-V2X estimates epistemic and aleatoric uncertainty in cooperative BEV segmentation

    Hyper-V2X: Hypernetworks for Estimating Epistemic and Aleatoric Uncertainty in Cooperative Bird's-Eye-View Semantic Segmentation

    Abhishek Dinkar Jagtap +2

  47. cs.CV 2026-05-20 reviewed
    Adaptive fusion gives linear SSMs flexible vision and 3D fusion

    Deformba: Vision State Space Model with Adaptive State Fusion

    Hongyu Ke +6

  48. cs.LG 2026-05-20 reviewed
    Contrasting patients with controls isolates disease subgroups

    Automatic Discovery of Disease Subgroups by Contrasting with Healthy Controls

    Robin Louiset +4

  49. cs.CV 2026-05-20 reviewed
    Reweighting image-negative tokens cuts LVLM hallucinations

    Reducing Object Hallucination in LVLMs via Emphasizing Image-negative Tokens

    Meng Shen +2

  50. cs.CV 2026-05-20 reviewed
    Continuous flow matching generates realistic EEG signals

    Let EEG Models Learn EEG

    Yifan Wang +3