pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

9568 papers in cs.CV · page 15

  1. cs.IR 2026-05-18 reviewed
    Prompting methods raise table QA accuracy without training

    Efficient Table QA via TableGrid Navigation and Progressive Inference Prompting

    Amritansh Maurya +3

  2. cs.CV 2026-05-18 reviewed
    Occupancy-opacity split in Gaussians renders both reflection and transmission

    RT-Splatting: Joint Reflection-Transmission Modeling with Gaussian Splatting

    Ji Shi +4

  3. cs.CV 2026-05-18 reviewed
    Shared codebook bridges modalities without full data pairs

    CodeBind: Decoupled Representation Learning for Multimodal Alignment with Unified Compositional Codebook

    Zeyu Chen +2

  4. cs.CV 2026-05-18 reviewed
    GaussianZoom generates detailed 3D zooms from low-res inputs

    GaussianZoom: Progressive Zoom-in Generative 3D Gaussian Splatting with Geometric and Semantic Guidance

    Jiale Shi +5

  5. cs.CV 2026-05-18 reviewed
    Gaps in real face embeddings hold 10M virtual identities

    Non-Colliding Biometric Identities for Digital Entities: Geometry, Capacity, and Million-Scale Virtual Identity Provisioning

    Yuyang Ji +4

  6. cs.CV 2026-05-18 reviewed
    Noise alignment lets video models output endless coherent clips

    Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos

    X. Feng +8

  7. cs.SD 2026-05-18 reviewed
    Speech audio accelerates MRI reconstruction of vocal tracts

    SIREM: Speech-Informed MRI Reconstruction with Learned Sampling

    Md Hasan +8

  8. cs.CV 2026-05-18 reviewed
    Synthetic egocentric videos raise real interaction model accuracy

    EgoInteract: Synthetic Egocentric Videos Generation for Interaction Understanding and Anticipation

    Rosario Leonardi +6

  9. cs.CV 2026-05-18 reviewed
    Synthetic egocentric videos improve real interaction models

    EgoInteract: Synthetic Egocentric Videos Generation for Interaction Understanding and Anticipation

    Rosario Leonardi +6

  10. cs.CV 2026-05-18 reviewed
    Question routing lifts zero-shot spatial video QA by up to 5%

    SPATIOROUTE: Dynamic Prompt Routing for Zero-Shot Spatial Reasoning

    Pawat Chunhachatrachai +3

  11. cs.RO 2026-05-18 reviewed
    RGB cameras build 3D scene graphs for robots as well as depth sensors

    RGB-only Active 3D Scene Graph Generation for Indoor Mobile Robots

    Giorgia Modi +3

  12. cs.AI 2026-05-18 reviewed
    Sensory-bounded reasoning lifts MLLM accuracy on second-order belief tasks

    Beyond the Cartesian Illusion: Testing Two-Stage Multi-Modal Theory of Mind under Perceptual Bottlenecks

    Yajing Zhou +1

  13. cs.CV 2026-05-18 reviewed
    Best Segmentation Buddies match image pixels to 3D shape parts

    Best Segmentation Buddies for Image-Shape Correspondence

    Itai Lang +3

  14. cs.CV 2026-05-18 reviewed
    View-aware experts lift aerial-ground re-ID mAP by 10 points

    View-Aware Semantic Alignment for Aerial-Ground Person Re-Identification

    Quan Zhang +6

  15. cs.LG 2026-05-18 reviewed
    Heavy-light split cuts diffusion sampling cost by 2-4x

    Dual-Rate Diffusion: Accelerating diffusion models with an interleaved heavy-light network

    Grigory Bartosh +5

  16. cs.RO 2026-05-18 reviewed
    External cameras boost robot scene recall by up to 79%

    Fixed External Cameras as Common Prior Maps for Active 3D Scene Graph Generation

    Giorgia Modi +3

  17. cs.CV 2026-05-18 reviewed
    TokenMask predicts masks directly from token affinities

    Token-Space Mask Prediction for Efficient Vision Transformer Segmentation

    Calvin Galagain +2

  18. cs.CV 2026-05-18 reviewed
    Agentic selector ranks second on four-day multimodal challenge

    MARS: Technical Report for the CASTLE Challenge at EgoVis 2026

    Haoyu Zhang +6

  19. cs.CV 2026-05-18 reviewed
    Soft attention masks enable text spotting without rectification

    Do You Need Text Rectification? Soft Attention Mask Embedding for Rectification-Free Scene Text Spotting

    Antonio Colombo +1

  20. cs.CV 2026-05-18 reviewed
    Consistency reward lifts VLM spatial reasoning

    Self-Evolving Spatial Reasoning in Vision Language Models via Geometric Logic Consistency

    Junming Liu +6

  21. cs.CV 2026-05-18 reviewed
    New module keeps multimodal models true to images during long answers

    Vision Inference Former: Sustaining Visual Consistency in Multimodal Large Language Models

    Xinpeng Dong +5

  22. cs.CV 2026-05-18 reviewed
    Semi-supervised method removes nighttime flares from unlabeled images

    Semi-LAR: Semi-supervised Contrastive Learning with Linear Attention for Removal of Nighttime Flares

    Xiyu Zhu +3

  23. cs.CV 2026-05-18 reviewed
    Joint model fuses reconstruction and causal video generation for driving

    Xiaomi Auto World Model: A Joint World Model Integrating Reconstruction and Generation for Autonomous Driving

    Lijun Zhou +36

  24. cs.CV 2026-05-18 reviewed
    3D generators leave fingerprints that identify their source

    Who Generated This 3D Asset? Learning Source Attribution for Generative 3D Models

    Sihan Ma +2

  25. cs.CV 2026-05-18 reviewed
    Semantic guidance localizes lesions before segmentation and diagnosis

    Rad-VLSM: A Cross-Modal Framework with Semantics-Assisted Prompting for Medical Segmentation and Diagnosis

    Fengyi Zhang +4

  26. cs.CV 2026-05-18 reviewed
    Hybrid tokenizer splits semantics from pixels for better results

    WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens

    Yiwei Guo +6

  27. cs.CL 2026-05-18 reviewed
    Bangla medical questions trip up top AI models

    How Good LLMs Are at Answering Bangla Medical Visual Questions? Dataset and Benchmarking

    Rafid Ahmed +5

  28. cs.AI 2026-05-18 reviewed
    Grounding cuts tokens 18x while matching big models on home tasks

    TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning

    Zhiyuan Feng +13

  29. cs.CV 2026-05-18 reviewed
    Residual fusion refines hands while stabilizing body in video meshes

    DanceHMR: Hand-Aware Whole-Body Human Mesh Recovery from Monocular Videos

    Wenhao Shen +5

  30. cs.CV 2026-05-18 reviewed
    Fusion recovers detailed hands and stable bodies from single videos

    DanceHMR: Hand-Aware Whole-Body Human Mesh Recovery from Monocular Videos

    Wenhao Shen +5

  31. cs.CV 2026-05-18 reviewed
    Diffusion model generates aligned urban energy maps from roads

    SENSE: Satellite-based ENergy Synthesis for Sustainable Environment

    Kailai Sun +7

  32. cs.CV 2026-05-18 reviewed
    Synthetic data cuts mixed-object counting errors by 20%

    The MixCount Dataset: Bridging the Data Gap for Open-Vocabulary Object Counting

    Corentin Dumery +3

  33. cs.CV 2026-05-18 reviewed
    Lightweight ConvNet ensembles match heavy models on Arabic handwriting

    Embedded ConvNet Ensembles: A Lightweight Approach to Recognize Arabic Handwritten Characters

    Mohsine EL Khayati +2

  34. cs.CV 2026-05-18 reviewed
    Pixle attack fools Arabic handwriting AI at 99-100% success

    Threats to Arabic Handwriting Recognition: Investigating Black-Box Adversarial Attacks on embedded ConvNet models

    Mohsine EL Khayati +3

  35. eess.IV 2026-05-18 reviewed
    Triplane features adapt to standard codecs for better volumetric compression

    CATRF: Codec-Adaptive TriPlane Radiance Fields for Volumetric Content Delivery

    Tung-I Chen +3

  36. cs.CV 2026-05-18 reviewed
    Instant3D creates 3D assets in seconds

    Efficient 3D Content Reconstruction and Generation

    Jiahao Li

  37. cs.CV 2026-05-18 reviewed
    Dynamic relevance scores pick pruning regime to shrink omni-modal tokens

    OmniSelect: Dynamic Modality-Aware Token Compression for Efficient Omni-modal Large Language Models

    Morunliu Yang +8

  38. cs.CV 2026-05-18 reviewed
    Template-guided field fuses cues for fast 3D shape matching

    SGSoft: Learning Fused Semantic-Geometric Features for 3D Shape Correspondence via Template-Guided Soft Signals

    Soyeon Yoon +2

  39. cs.CV 2026-05-18 reviewed
    Lateral line patches raise salmon re-ID accuracy across cameras

    Patch Ensembles for Robust Salmon Re-Identification with Weak Trajectory Labels

    Espen Uri H{\o}gstedt +3

  40. cs.CV 2026-05-18 reviewed
    Filtered data beats bigger models in grocery retrieval

    What Matters for Grocery Product Retrieval with Open Source Vision Language Models

    Emmanuel G. Maminta +1

  41. cs.CV 2026-05-18 reviewed
    Dual-stage activation fixes attribute binding in open-vocabulary detection

    DSAA: Dual-Stage Attribute Activation for Fine-grained Open Vocabulary Detection

    Donghong Jiang +6

  42. cs.CV 2026-05-18 reviewed
    Training fixes attention so text alone locates video objects

    See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding

    Boyuan Sun +4

  43. cs.CV 2026-05-18 reviewed
    TinySAM 2 cuts SAM 2 memory tokens to 7 percent at 90 percent accuracy

    TinySAM 2: Extreme Memory Compression for Efficient Track Anything Model

    Zhaoyuan Ding +3

  44. cs.CV 2026-05-18 reviewed
    Semantic scoring refines distilled image datasets

    SAS: Semantic-aware Sampling for Generative Dataset Distillation

    Mingzhuo Li +6

  45. cs.CV 2026-05-18 reviewed
    Graph completion turns non-functional 3D models functional

    Functionalization via Structure Completion and Motion Rectification

    Mingrui Zhao +10

  46. eess.IV 2026-05-18 reviewed
    Inter-frame learning cuts bits for LiDAR geometry

    Inter-LPCM: Learning-based Inter-Frame Predictive Coding for LiDAR Point Cloud Compression

    Chang Sun +5

  47. cs.LG 2026-05-18 reviewed
    Per-module scaling lifts low-bit quantization accuracy

    MARR: Module-Adaptive Residual Reconstruction for Low-Bit Post-Training Quantization

    Le Su +2

  48. cs.CV 2026-05-18 reviewed
    Optical encoder cuts gaze tracking latency to 3.4 ms

    Low Latency Gaze Tracking via Latent Optical Sensing

    Yidan Zheng +5

  49. eess.IV 2026-05-18 reviewed
    Event data fused with frames yields clear silhouettes at kilohertz rates

    See Silhouettes in Motion with Neuromorphic Vision

    Pei Zhang +4

  50. cs.CV 2026-05-18 reviewed
    Decoupled attention balances references in remote sensing super-resolution

    Learning to Balance: Decoupled Siamese Diffusion Transformer for Reference-Based Remote Sensing Image Super-Resolution

    Bin Luo +6