pith. sign in

Wavlm: Large-scale self-supervised pre- training for full stack speech processing.IEEE Journal of Selected Topics in Signal Processing, 16(6):1505–1518

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

citation-role summary

method 1

citation-polarity summary

years

2026 4

verdicts

UNVERDICTED 4

roles

method 1

polarities

use method 1

representative citing papers

The Indra Representation Hypothesis for Multimodal Alignment

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

Unimodal model representations converge to a relational structure captured by the Indra representation via V-enriched Yoneda embedding, which is unique and structure-preserving and improves cross-model and cross-modal robustness when instantiated with angular distance.

citing papers explorer

Showing 4 of 4 citing papers.

  • The Indra Representation Hypothesis for Multimodal Alignment cs.CV · 2026-04-06 · unverdicted · none · ref 6

    Unimodal model representations converge to a relational structure captured by the Indra representation via V-enriched Yoneda embedding, which is unique and structure-preserving and improves cross-model and cross-modal robustness when instantiated with angular distance.

  • SemaVoice: Semantic-Aware Continuous Autoregressive Speech Synthesis eess.AS · 2026-05-16 · unverdicted · none · ref 51

    SemaVoice adds SFM-guided alignment to refine continuous speech representations in autoregressive TTS, reporting 1.71% English WER on Seed-TTS and competitiveness with open-source SOTA.

  • OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models cs.CL · 2026-04-01 · unverdicted · none · ref 53

    OmniVoice introduces a diffusion language model-style non-autoregressive TTS system that directly maps text to multi-codebook acoustic tokens, scaling zero-shot synthesis to over 600 languages with SOTA results on multilingual benchmarks using 581k hours of open data.

  • Omni-Customizer: End-to-End MultiModal Customization for Joint Audio-Video Generation cs.CV · 2026-05-17 · unverdicted · none · ref 4

    Omni-Customizer proposes an end-to-end framework using Omni-Context Fusion, Masked TTS Cross-Attention, Semantic-Anchored Multimodal RoPE, and specialized training curricula to achieve precise multimodal identity binding in joint audio-video generation.