pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

378 papers in cs.MM · page 1

  1. cs.MM 2026-05-22 reviewed
    Swarical localizes flying light specks twice as fast

    Swarical: An Integrated Hierarchical Approach to Localizing Flying Light Specks

    Hamed Alimohammadzadeh +1

  2. cs.CV 2026-05-22 reviewed
    Adaptive search fixes blind spots in high-res image perception for LLMs

    CVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception

    Liupeng Li +6

  3. cs.GR 2026-05-22 reviewed
    Sketches control long video generation via independent shots

    DrawVideo: Generating Long Video from Storyboard Keyframe Sketches

    Chuanzhi Xu +9

  4. cs.CV 2026-05-22 reviewed
    Semantic scores trigger early stops in video motion search

    FAST-ME: Foundation-aware Adaptive Stopping for Motion Estimation for Efficient IoT Video Analysis

    Kakia Panagidi +1

  5. cs.SD 2026-05-22 reviewed
    Multi-stream prompts cut deepfake errors in mixed audio

    MixFake: Benchmarking and Enhancing Audio Deepfake Detection in Diverse Real-world Mixed Audio

    Qingcao Li +5

  6. cs.SD 2026-05-21 reviewed
    Diffusion models match discrete models for live music

    Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators

    Zachary Novack +10

  7. cs.CV 2026-05-21 reviewed
    Sparse autoencoder links reasoning steps to image masks

    SegCompass: Exploring Interpretable Alignment with Sparse Autoencoders for Enhanced Reasoning Segmentation

    Zhenyu Lu +6

  8. cs.CV 2026-05-21 reviewed
    Unified model handles many fashion search types at once

    FashionLens: Toward Versatile Fashion Image Retrieval via Task-Adaptive Learning

    Haokun Wen +5

  9. cs.CV 2026-05-21 reviewed
    MLLM planner in ViT space guides DiT to SOTA video generation and edits

    Bernini: Latent Semantic Planning for Video Diffusion

    Bernini Team: Chenchen Liu +10

  10. cs.CV 2026-05-21 reviewed
    Multi-grained compression lifts long video QA accuracy

    MuKV: Multi-Grained KV Cache Compression for Long Streaming Video Question-Answering

    Junbin Xiao +4

  11. cs.CR 2026-05-21 reviewed
    Proxy reorders API keys to embed traceable watermarks

    PEMark: Watermarking API Responses Based on Proxy Gateways and Position Encoding

    Yifei Zhou +4

  12. cs.MM 2026-05-20 reviewed
    Review groups LLM multimodal emotion studies into three directions

    Multimodal Emotion Recognition with Large Language Models

    Hongrui Zhang +6

  13. cs.CR 2026-05-20 reviewed
    Framework turns AI detection metrics into legal evidence thresholds

    Verifiable Provenance and Watermarking for Generative AI: An Evidentiary Framework for International Operational Law and Domestic Courts

    Gustav Olaf Yunus Laitinen-Fredriksson Lundstr\"om-Imanov +1

  14. cs.MM 2026-05-19 reviewed
    AI turns I-Ching coin casts into meaning-driven music

    Music of Changing Lines: Toward a Culturally Situated Approach to the I-Ching

    Ling Qi +2

  15. cs.LG 2026-05-19 reviewed
    Mixture of experts catches text-camouflaged fraud in graphs

    CAMERA: Adapting to Semantic Camouflage in Unsupervised Text-Attributed Graph Fraud Detection

    Junjun Pan +5

  16. eess.IV 2026-05-19 reviewed
    VVC techniques speed up partitioning but adapt to each VTM update

    Partition Tree Search Acceleration for VVC: Survey and Evaluation with VTM Evolution

    M.E.A. Kherchouche +4

  17. eess.IV 2026-05-19 reviewed
    Set shaping cuts steganography KL divergence by 25 percent

    Set Shaping Theory as a Complementary Payload-Shaping Layer for Steganography

    Aida Koch +3

  18. cs.SD 2026-05-19 reviewed
    Scaled simulations cut speech recognition errors over 30 percent

    Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation

    Zhifei Xie +6

  19. eess.IV 2026-05-19 reviewed
    TADA adapts steganalysis to unknown JPEG pipelines

    Tackle CSM in JPEG Steganalysis with Data Adaptation

    Rony Abecidan (CRIStAL) +5

  20. eess.IV 2026-05-19 reviewed
    Semantic system cuts wireless video bandwidth by up to 75%

    Perception-Aware Video Semantic Communication

    Yinhuan Huang +1

  21. cs.CV 2026-05-19 reviewed
    Post-training lifts video models' physical consistency

    PhyWorld: Physics-Faithful World Model for Video Generation

    Pu Zhao +12

  22. cs.CV 2026-05-18 reviewed
    Self-supervised backbones boost artwork classification

    Harnessing Self-Supervised Features for Art Classification

    Federico Melis +4

  23. cs.MM 2026-05-18 reviewed
    Open-web context needed to forecast micro-video virality

    Will It Go Viral? Grounding Micro-Video Popularity Prediction on the Open Web

    Ryang Heo +1

  24. eess.IV 2026-05-18 reviewed
    Unpredictable motion destabilizes compressed video more than high motion

    Evaluating the Effect of Compression on Video Temporal Consistency Using Objective Quality Metrics

    Peter Zsoldos

  25. eess.IV 2026-05-18 reviewed
    Triplane features adapt to standard codecs for better volumetric compression

    CATRF: Codec-Adaptive TriPlane Radiance Fields for Volumetric Content Delivery

    Tung-I Chen +3

  26. cs.IR 2026-05-18 reviewed
    Dynamic modulation replaces static IDs in multimodal recommendations

    Modality-Aware Identity Construction and Counterfactual Structure Learning for ID-Free Multimodal Recommendation

    Hongjian Ma +4

  27. eess.IV 2026-05-18 reviewed
    Inter-frame learning cuts bits for LiDAR geometry

    Inter-LPCM: Learning-based Inter-Frame Predictive Coding for LiDAR Point Cloud Compression

    Chang Sun +5

  28. cs.MM 2026-05-18 reviewed
    Two-phase sampling matches contradictory audio prompts to video

    CounterFlow: A Two-Phase Inference-Time Sampling for Counterfactual Video Foley Generation

    Gyubin Lee +2

  29. cs.CV 2026-05-17 reviewed
    Framework binds faces and voices for consistent audio-video generation

    Omni-Customizer: End-to-End MultiModal Customization for Joint Audio-Video Generation

    Yuheng Chen +6

  30. cs.CV 2026-05-17 reviewed
    EchoSR runs lightweight super-resolution twice as fast with better quality

    EchoSR: Efficient Context Harnessing for Lightweight Image Super-Resolution

    Hanli Zhao +5

  31. cs.SD 2026-05-17 reviewed
    Optimal transport matches note distributions for piano transcription

    A Distribution Matching Approach to Neural Piano Transcription with Optimal Transport

    Weixing Wei +3

  32. cs.IR 2026-05-17 reviewed
    Dual model generates fashion images with text explanations

    Dual-Diffusional Generative Fashion Recommendation

    Mingzhe Yu +3

  33. cs.GR 2026-05-16 reviewed
    Single compressed atlas drives better immersive video

    A Single Atlas is All You Need: Decoder-Side Gaussian Splatting for Immersive Video

    Dawid Mieloch +1

  34. cs.GR 2026-05-16 reviewed
    Multi-agent loop lifts brand video yield to 89%

    Genflow Ad Studio: A Compound AI Architecture for Brand-Aligned, Self-Correcting Video Generation

    Debanshu Das +3

  35. eess.IV 2026-05-16 reviewed
    Legacy GPUs power real-time 8K60 for connected vehicles

    Sustainable Real-Time 8K60 HEVC Encoding for V2X: Repurposing Legacy NVENC Hardware at the Vehicular Edge

    Kasidis Arunruangsirilert +1

  36. cs.CR 2026-05-15 reviewed
    Logistic-map encryption plus Huffman compression handles large videos in one step

    A Method for Securely Transmitting Large Video Files Using Chaotic Compression and Encryption

    Shiladitya Bhattacharjee +4

  37. eess.IV 2026-05-15 reviewed
    AV2 cuts video bitrates nearly 30 percent vs AV1

    Video Quality Evaluation Methodology and Result of AV2 Compression Performance

    Zhijun Lei +4

  38. eess.IV 2026-05-15 reviewed
    Live streams switch resolutions on the fly to save 9% bitrate

    Dynamic resolution switching for live streaming

    Xin Xiong +4

  39. cs.CV 2026-05-14 reviewed
    t-FCW graphs classify point clouds in 7 seconds on GPU

    A Unified Non-Parametric and Interpretable Point Cloud Analysis via t-FCW Graph Representation

    Haijian Lai +6

  40. cs.GR 2026-05-14 reviewed
    Audio and text tuning enables motion edits in video models

    Sound Sparks Motion: Audio and Text Tuning for Video Editing

    AmirHossein Naghi Razlighi +4

  41. cs.SD 2026-05-14 reviewed
    SpeakerLLM turns speaker verification into natural-language reasoning

    SpeakerLLM: A Speaker-Specialized Audio-LLM for Speaker Understanding and Verification Reasoning

    KiHyun Nam +4

  42. cs.CV 2026-05-14 reviewed
    Two-stage model fuses radar and satellite for sharper rain forecasts

    VMU-Diff: A Coarse-to-fine Multi-source Data Fusion Framework for Precipitation Nowcasting

    Chunlei Shi +8

  43. cs.CV 2026-05-14 reviewed
    RC metrics align object removal scores with human perception

    PROVE: A Perceptual RemOVal cohErence Benchmark for Visual Media

    Fuhao Li +8

  44. cs.MM 2026-05-14 reviewed
    Multi-agent system resolves multimedia claims into editable reports

    Contestable Multi-Agent Debate with Arena-based Argumentative Computation for Multimedia Verification

    Truong Thanh Hung Nguyen +5

  45. cs.CV 2026-05-14 reviewed
    Delta Forcing curbs drift in interactive video generation

    Delta Forcing: Trust Region Steering for Interactive Autoregressive Video Generation

    Yuheng Wu +6

  46. cs.CV 2026-05-14 reviewed
    Trust region limits teacher bias in autoregressive video generation

    Delta Forcing: Trust Region Steering for Interactive Autoregressive Video Generation

    Yuheng Wu +6

  47. cs.CV 2026-05-14 reviewed
    Delta Forcing steers video generators to stay consistent after events

    Delta Forcing: Trust Region Steering for Interactive Autoregressive Video Generation

    Yuheng Wu +6

  48. cs.CV 2026-05-13 reviewed
    Few channels control entire DiT image generation

    Few Channels Draw The Whole Picture: Revealing Massive Activations in Diffusion Transformers

    Evelyn Turri +4

  49. cs.CV 2026-05-13 reviewed
    Backbone knowledge alone fools frozen deepfake detectors

    Backbone is All You Need: Assessing Vulnerabilities of Frozen Foundation Models in Synthetic Image Forensics

    Chiara Musso +3

  50. cs.MA 2026-05-12 reviewed
    Synthetic dataset benchmarks AI for swim coaching

    Synthesizing the Expert: A Validated Multimodal Dataset for Trustworthy AI-Assisted Swimming Coaching

    Ahmad Al-Kabbany +1