pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

9568 papers in cs.CV · page 16

  1. cs.CV 2026-05-18 reviewed
    Agent reaches 0.90 WISE score in multi-turn image generation

    Generation Navigator: A State-Aware Agentic Framework for Image Generation

    Jinming Liu +4

  2. cs.CV 2026-05-18 reviewed
    Fewer semantic tokens match full multimodal performance

    A More Word-like Image Tokenization for MLLMs

    Hyun Lee +6

  3. cs.CV 2026-05-18 reviewed
    Adapted FamNet counts washer parts at 1.96 MAE

    Counting Machine Parts

    Benedict Florance Arockiaraj +3

  4. cs.CV 2026-05-18 reviewed
    Raw patches cut language bias in remote sensing vision models

    SkyNative: A Native Multimodal Framework for Remote Sensing Visual Evidence Reasoning

    Xiao Yang +12

  5. cs.AI 2026-05-18 reviewed
    Benchmark shows agents at 79% on game video questions vs 95% oracle

    SVFSearch: A Multimodal Knowledge-Intensive Benchmark for Short-Video Frame Search in the Gaming Vertical Domain

    Lingtao Mao +6

    4 Piths
  6. cs.AI 2026-05-18 reviewed
    Agents reach 79% on game video frames

    SVFSearch: A Multimodal Knowledge-Intensive Benchmark for Short-Video Frame Search in the Gaming Vertical Domain

    Lingtao Mao +6

    4 Piths
  7. cs.CV 2026-05-18 reviewed
    New UAV benchmark slashes 3D reconstruction errors by up to 84%

    UAVFF3D: A Geometry-Aware Benchmark for Feed-Forward UAV 3D Reconstruction

    Xiang Yang +3

  8. cs.CV 2026-05-18 reviewed
    Visual atlases evolve from trajectories to guide VLM agents

    AtlasVA: Self-Evolving Visual Skill Memory for Teacher-Free VLM Agents

    Pan Wang +5

  9. cs.LG 2026-05-18 reviewed
    Transient expert steers MoE updates to cut forgetting

    CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning

    Yang Liu +2

  10. cs.CV 2026-05-18 reviewed
    Streaming video model cuts tokens 95% with cascaded control

    An Efficient Streaming Video Understanding Framework with Agentic Control

    Jinming Liu +9

  11. cs.LG 2026-05-18 reviewed
    One anchor pair identifies domain transfer under Jacobian sparsity

    Domain Transfer Becomes Identifiable via a Single Alignment

    Sagar Shrestha +3

  12. cs.CV 2026-05-18 reviewed
    Decoupled geometry and cache yield consistent house panoramas

    PanoWorld: A Generative Spatial World Model for Consistent Whole-House Panorama Synthesis

    Jinrang Jia +3

  13. cs.CV 2026-05-18 reviewed
    Surgical video QA handles full procedures with temporal consolidation

    SurgLQA: Scalable Long-Horizon Surgical Video Question Answering

    Diandian Guo +4

  14. cs.RO 2026-05-18 reviewed
    Benchmark adds touch, RL training, and real robots to world model tests

    WorldArena 2.0: Extending Embodied World Model Benchmarking on Modality, Functionality and Platform

    Yu Shang +24

  15. cs.CV 2026-05-18 reviewed
    One model translates any sensor features to any other without retraining

    One Model to Translate Them All: Universal Any-to-Any Translation for Heterogeneous Collaborative Perception

    Yang Li +9

  16. cs.CV 2026-05-18 reviewed
    Frequency disentanglement plus geodesic matching lifts few-shot medical segmentation

    Beyond Euclidean Prototypes: Spectral Disentanglement and Geodesic Matching for Few-Shot Medical Image Segmentation

    Penghao Jia +6

  17. cs.MM 2026-05-18 reviewed
    Two-phase sampling matches contradictory audio prompts to video

    CounterFlow: A Two-Phase Inference-Time Sampling for Counterfactual Video Foley Generation

    Gyubin Lee +2

  18. cs.CV 2026-05-18 reviewed
    Mamba model beats SOTA on ECG multi-label scores

    HexagonalWarriorMamba: Superior Threshold-Dependent Multi-label Classification of 12-Lead ECG Cardiac Abnormalities

    Huawei Jiang +8

  19. cs.CV 2026-05-18 reviewed
    Classical SIFT beats learned descriptors on accuracy and speed

    PySIFT: GPU-Resident Deterministic SIFT for Deep Learning Vision Pipelines

    Sivakumar K.S. +2

  20. cs.CV 2026-05-18 reviewed
    Smartphone LiDAR sees hidden objects with motion sampling

    Imaging Hidden Objects with Consumer LiDAR via Motion Induced Sampling

    Siddharth Somasundaram +4

  21. stat.ML 2026-05-18 reviewed
    Girsanov weights enable unbiased resampling for diffusion models

    Simple Approximation and Derivative Free Inference-Time Scaling for Diffusion Models via Sequential Monte Carlo on Path Measures

    Chenyang Wang +4

  22. cs.CV 2026-05-18 reviewed
    Temporal pruning speeds video diffusion while preserving fidelity

    Temporal Aware Pruning for Efficient Diffusion-based Video Generation

    Sheng Li +5

  23. cs.CV 2026-05-18 reviewed
    Temporal smoothing lets pruning speed up video diffusion

    Temporal Aware Pruning for Efficient Diffusion-based Video Generation

    Sheng Li +5

  24. cs.CV 2026-05-18 reviewed
    Warm-up trick lets MeanFlow scale to 80B image models

    Stabilizing, Scaling & Enhancing MeanFlow for Large-scale Diffusion Distillation

    Xiao He +5

  25. cs.CV 2026-05-18 reviewed
    VLMs count by prior instead of image when facts clash

    CounterCount: A Diagnostic Framework for Counting Bias in Vision Language Models

    Reem Alzahrani +5

  26. cs.CV 2026-05-18 reviewed
    Scene understanding training produces human-like fixations in foveated model

    Why We Look Where We Look: Emergent Human-like Fixations of a Foveated Visual Language Model Maximizing Scene Understanding

    Shravan Murlidaran +3

  27. cs.CV 2026-05-18 reviewed
    Fourier shapes achieve 88% IR detector attack success past 25 meters

    Unleashing the Representational Power of Fourier Shapes for Attacking Infrared Object Detection

    Yixing Yong +4

  28. cs.CV 2026-05-18 reviewed
    Reward variance selects learnable prompts for T2I training

    Curriculum Group Policy Optimization: Adaptive Sampling for Unleashing the Potential of Text-to-Image Generation

    Baoteng Li +10

  29. cs.CV 2026-05-18 reviewed
    Post-hoc sphere normalization lifts long-tailed OOD AUROC

    Is Complex Training Necessary for Long-Tailed OOD Detection? A Re-think from Feature Geometry

    Ningkang Peng +2

  30. cs.LG 2026-05-18 reviewed
    High noisy-label accuracy fails to ensure OOD reliability

    When Accuracy Is Not Enough: Uncertainty Collapse between Noisy Label Learning and Out-of-Distribution Detection

    Ningkang Peng +4

  31. cs.CV 2026-05-18 reviewed
    Saliency consistency loss raises defect detection accuracy

    Network Knowledge Prior Guided Learning for Data-Efficient Surface Defect Detection

    Hang-Cheng Dong +3

  32. cs.CV 2026-05-18 reviewed
    LiteLoc slashes localization storage 94% and speeds pose solving 19x

    Efficient Sparse-to-Dense Visual Localization via Compact Gaussian Scene Representation and Accelerated Dense Pose Estimation

    Zizhuo Li +3

  33. cs.CV 2026-05-18 reviewed
    Tree constraints in training produce consistent plant skeletons

    PlantPose: Universal Plant Skeleton Estimation via Tree-constrained Graph Generation

    Xinpeng Liu +3

  34. cs.CV 2026-05-18 reviewed
    Framework makes one physical attack fool multiple AI vision tasks

    Towards Universal Physical Adversarial Attacks via a Joint Multi-Objective and Multi-Model Optimization Framework

    Ziyang Liu +6

  35. cs.CV 2026-05-18 reviewed
    Aligning latent mappings reduces inconsistency in multimodal models

    LatentUMM: Dual Latent Alignment for Unified Multimodal Models

    Yinyi Luo +4

  36. cs.CV 2026-05-18 reviewed
    Pixel diffusion reaches FID 1.60 at 256 resolution in 320 epochs

    FrequencyBooster: Full-Frequency Modeling for High-Fidelity Pixel Diffusion

    Lichen Ma +7

  37. cs.CV 2026-05-18 reviewed
    Adapter boosts Vision Transformer image quality assessment with fewer parameters

    Unleashing Vision Transformer Potential In Image Quality Assessment via Global-Local Adaptive Interaction

    Yu Li +5

  38. cs.CV 2026-05-18 reviewed
    Sparsity experts and distillation enable continual adaptation

    MoASE++: Mixture of Activation Sparsity Experts with Domain-Adaptive On-policy Distillation for Continual Test Time Adaptation

    Ronyu Zhang +10

  39. cs.CV 2026-05-18 reviewed
    Uncertainty flow plus point cloud interaction cuts hand pose error

    UST-Hand: An Uncertainty-aware Spatiotemporal Point Cloud Interaction Network for 3D Self-supervised Hand Pose Estimation

    Tianhao Han +7

  40. cs.CV 2026-05-18 reviewed
    Continual learning adapts X-ray models to new domains at 88.66% accuracy

    Domain Incremental Learning for Pandemic-Resilient Chest X-Ray Analysis

    Danu Kim

  41. cs.CV 2026-05-18 reviewed
    Prefix length turns frozen VLM embeddings into a semantic dial

    GraSP-VL: Length as a Semantic Granularity Interface for Vision-Language Representations

    Zesheng Li +2

  42. cs.CV 2026-05-18 reviewed
    Patch-MoE Mamba improves segmentation of polyps and skin lesions

    Patch-MoE Mamba: A Patch-Ordered Mixture-of-Experts State Space Architecture for Medical Image Segmentation

    Diego Adame +9

  43. cs.CV 2026-05-17 reviewed
    STDP rules deliver 78.6 percent mAP for event cameras on CPU

    Brain-inspired spike-timing plasticity for reliable label-efficient event-camera vision

    Mohamad Yazan Sadoun +2

  44. cs.CV 2026-05-17 reviewed
    1D-2D CNN fusion with attention hits 99-100% on ECG identification

    Attention-Guided Fusion of 1D and 2D CNNs for Robust ECG-Based Biometric Recognition

    Arioua +7

  45. cs.CV 2026-05-17 reviewed
    4D Gaussians let you query driving scenes at any future time

    GEM: Gaussian Evolution Model for Occupancy Forecasting and Motion Planning

    Cheng Chen +2

  46. cs.CV 2026-05-17 reviewed
    Sobel edges match finger knuckles at 17% rate

    A simple approach for biometrics: Finger-knuckle prints recognition based on a Sobel filter and similarity measures

    E. O. Rodrigues +3

  47. cs.CV 2026-05-17 reviewed
    Deep learning cuts pathology slide file sizes 43-80 percent

    Deep learning-based compression of giga-resolution whole slide images

    Maren H{\o}ib{\o} +4

  48. cs.RO 2026-05-17 reviewed
    Monocular RGB+IMU matches RGB-D accuracy for indoor scene graphs

    Mono-Hydra++: Real-Time Monocular Scene Graph Construction with Multi-Task Learning for 3D Indoor Mapping

    U. V. B. L. Udugama +2

  49. cs.IR 2026-05-17 reviewed
    Three-stage pipeline lifts video RAG retrieval from 0.195 to 0.759 nDCG

    MARQUIS: A Three-Stage Pipeline for Video Retrieval-Augmented Generation

    Debashish Chakraborty +9

  50. cs.CV 2026-05-17 reviewed
    System maps hand contacts to surfaces in operating rooms

    TouchMap-OR: Multi-View 3D Mapping of Hand-Surface Contacts

    Sophokles Ktistakis +3