pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

9568 papers in cs.CV · page 12

  1. cs.CV 2026-05-19 reviewed
    Spatial weighting and dual loss create novel text-to-image objects

    Self-Creative Text-to-Object Generation using Semantic-Aware Spatial Weighting

    Yue Yu +4

  2. cs.GR 2026-05-19 reviewed
    Sparse anchor fields yield editable SVGs at full raster fidelity

    AnchorFlow: Editable SVG Reconstruction via Sparse Anchor Point Fields

    Mengnan Jiang +4

  3. cs.CV 2026-05-19 reviewed
    Evidential head gives reliable uncertainty for 3D pointmaps

    Trust It or Not: Evidential Uncertainty for Feed-Forward 3D Reconstruction with Trust3R

    Zihao Zhu +4

  4. cs.CV 2026-05-19 reviewed
    RL solver reaches 82.9% on CAPTCHA benchmark

    CaptchaMind: Training CAPTCHA Solvers via Reinforcement Learning with Explicit Reasoning Supervision

    Pengcheng Wang +7

  5. cs.CV 2026-05-19 reviewed
    Replace blocks with synthesized operators to cut training costs

    Replacement Learning: Training Neural Networks with Fewer Parameters

    Yuming Zhang +7

  6. cs.CV 2026-05-19 reviewed
    Early core token attention ranks best seeds for text-to-image results

    Boosting Text-to-Image Diffusion Models via Core Token Attention-Based Seed Selection

    Yunzhe Zhang +2

  7. cs.CV 2026-05-19 reviewed
    The paper describes a framework for 3D localization in multimodal large language models…

    Towards Camera-Robust 3D Localization: Equation-Anchored Tool-Use for MLLMs

    Xueying Jiang +6

  8. cs.CV 2026-05-19 reviewed
    Dual prompts help CLIP identify occluded people better

    Dual-Prompt CLIP with Hybrid Visual Encoders for Occluded Person Re-Identification

    Zhangjian Ji +3

  9. cs.RO 2026-05-19 reviewed
    Negative data cuts collisions in driving AI models

    SafeAlign-VLA: A Negative-Enhanced Safe Alignment Framework for Risk-Aware Autonomous Driving

    Kefei Tian +4

  10. cs.CL 2026-05-19 reviewed
    Merging LLMs into VLMs boosts instructions but not math

    Investigating Cross-Modal Skill Injection: Scenarios, Methods, and Hyperparameters

    Zhiyu Xu +7

  11. cs.CV 2026-05-19 reviewed
    Dual-branch model wins photo quality challenge via explicit differences

    iDiff: Interpretable Difference-aware Framework for Pairwise Image Quality Assessment

    Xinli Yue +5

  12. cs.GR 2026-05-19 reviewed
    Single photo becomes real-time physics video of interacting objects

    TelePhysics: Physics-Grounded Multi-Object Scene Generation from a Single Image with Real-Time Interaction

    Xin Zhang +7

  13. cs.CV 2026-05-19 reviewed
    Text-guided edits keep watermarks intact after decoder-loss training

    Are Watermarked Images Editable? SafeMark for Watermark-Preserving Text-Guided Image Editing

    Xiaodong Wu +5

  14. cs.CV 2026-05-19 reviewed
    Subtraction module lifts unsupervised video domain adaptation

    Return of Frustratingly Easy Unsupervised Video Domain Adaptation

    Pengfei Wei +4

  15. cs.CV 2026-05-19 reviewed
    Event pruning trims 80% tokens but raises reasoning accuracy

    EventPrune: Cascaded Event-Assisted Token Pruning for Efficient First-Person Dynamic Spatial Reasoning

    Pengtao Ma +9

  16. cs.CV 2026-05-19 reviewed
    PathCTM cuts pathology patches by 96 percent

    Thinking in Scales: Accelerating Gigapixel Pathology Image Analysis via Adaptive Continuous Reasoning

    Jiusong Ge +15

  17. cs.RO 2026-05-19 reviewed
    Hybrid platform syncs real CAVs with CARLA-SUMO sims for closed-loop tests

    Closed-Loop Hybrid Digital Twin Platform for Connected and Automated Vehicle Validation

    Kanglong Quan +6

  18. cs.CV 2026-05-19 reviewed
    GUI agents reach only 36% success on media editing tasks

    CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing

    Haobo Hu +6

  19. cs.CR 2026-05-19 reviewed
    Dynamic prompts fuse backdoors with task performance to resist pruning

    Exposing Functional Fusion: A New Class of Strategic Backdoor in Dynamic Prompt Architectures

    Zeyao Liu +5

  20. cs.CV 2026-05-19 reviewed
    Targeted attacks succeed on encoders without knowing the task

    Targeted Downstream-Agnostic Attack

    Zhuxin Lei +2

  21. cs.LG 2026-05-19 reviewed
    CEPO boosts math reasoning to 43.43% at 2B and 60.56% at 4B

    CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization

    Ahmed Heakl +6

  22. cs.LG 2026-05-19 reviewed
    Model fuses layout and netlist to predict cell delay at 0.92% error

    FusionCell: Cross-Attentive Fusion of Layout Geometry and Netlist Topology for Standard-Cell Performance Prediction

    Haoyi Zhang +4

  23. cs.CV 2026-05-19 reviewed
    Prototype-anchored training halves calibration error in place recognition

    KappaPlace: Learning Hyperspherical Uncertainty for Visual Place Recognition via Prototype-Anchored Supervision

    Maya Yanko +1

  24. cs.CV 2026-05-19 reviewed
    Vision agent builds ad-hoc segmentations with working mask

    Vision Harnessing Agent for Open Ad-hoc Segmentation

    Zilin Wang +1

  25. cs.CV 2026-05-19 reviewed
    JUDO outperforms GPT-4o on industrial anomaly QA with normal image references

    JUDO: A Juxtaposed Domain-Oriented Multimodal Reasoner for Industrial Anomaly QA

    Hyunju Kang +3

  26. cs.CV 2026-05-19 reviewed
    Rebalancing attention reduces reference dominance and increases video motion

    Rebalancing Reference Frame Dominance to Improve Motion in Image-to-Video Models

    Wooseok Jeon +5

  27. cs.CV 2026-05-19 reviewed
    Rebalancing attention boosts motion in image-to-video models

    Rebalancing Reference Frame Dominance to Improve Motion in Image-to-Video Models

    Wooseok Jeon +5

  28. cs.CV 2026-05-19 reviewed
    Unlearning methods leave class traces in model representations

    Can Vision Models Truly Forget? Mirage: Representation-Level Certification of Visual Unlearning

    Zhenyu Yu +4

  29. cs.CV 2026-05-19 reviewed
    Variance penalty on penultimate neurons cuts medical AI bias

    Neuron Incidence Redistribution for Fairness in Medical Image Classification

    Abin Shoby +2

  30. cs.CV 2026-05-19 reviewed
    Tracking tokens lift LMM performance on 4D video tasks

    LMM-Track4D: Eliciting 4D Dynamic Reasoning in LMMs via Trajectory-Grounded Dialogue

    Chaoyue Li +3

  31. cs.CV 2026-05-19 reviewed
    Material codebook yields consistent physics parameters from video

    MatPhys: Learning Material-Aware Physics Parameters for Deformable Object Simulation from Videos

    Yang Yang +3

  32. cs.CV 2026-05-19 reviewed
    Concept ontology filters noisy negatives to lift chest X-ray zero-shot tasks

    Concept-Guided Noisy Negative Suppression for Zero-Shot Classification and Grounding of Chest X-Ray Findings

    Chenyu Lian +3

  33. cs.CV 2026-05-19 reviewed
    Heat dissipation flow matching outperforms most baselines

    Multi-Scale Generative Modeling with Heat Dissipation Flow Matching

    Jun Ma +4

  34. cs.CV 2026-05-19 reviewed
    Optical pass checks 15 deepfake videos simultaneously

    Scalable, Energy-Efficient Optical-Neural Architecture for Multiplexed Deepfake Video Detection

    Parnian Ghapandar Kashani +2

  35. cs.CV 2026-05-19 reviewed
    Atlas text boosts mammography BI-RADS accuracy

    MAM-CLIP: Vision-Language Pretraining on Mammography Atlases for BI-RADS Classification

    Halil Ibrahim Gulluk +1

  36. cs.GR 2026-05-19 reviewed
    Repositioned anchors keep motion contacts across body shapes

    Skinned Motion Retargeting with Spatially Adaptive Interaction Guidance

    Soojin Choi +5

  37. eess.IV 2026-05-19 reviewed
    Autoregressive codebook tokens sharpen MRI from extreme undersampling

    Next-Acceleration-Scale Prediction for Autoregressive MRI Reconstruction

    Yilmaz Korkmaz +1

  38. eess.IV 2026-05-19 reviewed
    Autoregressive token prediction sharpens MRI from sparse scans

    Next-Acceleration-Scale Prediction for Autoregressive MRI Reconstruction

    Yilmaz Korkmaz +1

  39. cs.LG 2026-05-19 reviewed
    Claim differences as RL rewards balance caption hallucinations and omissions

    ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison

    Tianle Li +9

  40. cs.CV 2026-05-19 reviewed
    Integral feedback reduces hallucinations in CT medical reports

    Regulating Anatomy-Aware Rewards via Trajectory-Integral Feedback for Volumetric Computed Tomography Analysis

    Tianwei Lin +9

  41. cs.CV 2026-05-19 reviewed
    Two-stage training adds semantics to latent visual reasoning

    Semantic-Enriched Latent Visual Reasoning

    Tianrun Xu +10

  42. cs.CV 2026-05-19 reviewed
    HERA lifts CD-FSS accuracy over 4 mIoU points with tiny updates

    Selective, Regularized, and Calibrated: Harnessing Vision Foundation Models for Cross-Domain Few-Shot Semantic Segmentation

    Junyuan Ma +4

  43. cs.CV 2026-05-19 reviewed
    Event streams improve VLM scene understanding in tough conditions

    RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding

    Hanqing Liu +5

  44. cs.CV 2026-05-19 reviewed
    Event streams lift VLM captioning and VQA scores in low light and motion

    RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding

    Hanqing Liu +5

  45. cs.CV 2026-05-19 reviewed
    DynaTok trims 90% of video tokens with 95% accuracy retained

    DynaTok: Temporally Adaptive and Positional Bias-Aware Token Compression for Video-LLMs

    Minyoung Park +2

  46. cs.CV 2026-05-19 reviewed
    Hierarchical rewards raise text accuracy in image generators

    TextAlign: Preference Alignment for Text Rendering with Hierarchical Rewards

    Mingxuan Cui +8

  47. cs.CV 2026-05-19 reviewed
    Image editing replaces video for robot task planning

    SWEET: Sparse World Modeling with Image Editing for Embodied Task Execution

    Yiren Song +4

  48. cs.CV 2026-05-19 reviewed
    Gated CNN detects falls on smartwatches without attention

    You Don't Need Attention: Gated Convolutional Modeling for Watch-Based Fall Detection

    Sana Alamgeer +4

  49. cs.CV 2026-05-19 reviewed
    Metamorphic relations reveal hidden VQA failures missed by accuracy

    MetaRA: Metamorphic Robustness Assessment for Multimodal Large Language Model-based Visual Question Answering Systems

    Quanxing Xu +6

  50. cs.GR 2026-05-19 reviewed
    Matérn noise gives flow matching triangulation-agnostic behavior

    Mat\'ern Noise for Triangulation-Agnostic Flow Matching on Meshes

    Tianshu Kuai +3