hub

preprint arXiv:1904.05862 , year=

· 1904 · arXiv 1904.05862

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

SpurAudio: A Benchmark for Studying Shortcut Learning in Few-Shot Audio Classification

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

SpurAudio benchmark shows state-of-the-art few-shot audio classifiers suffer large performance drops when background correlations are disrupted, even in large pretrained models.

The Indra Representation Hypothesis for Multimodal Alignment

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

Unimodal model representations converge to a relational structure captured by the Indra representation via V-enriched Yoneda embedding, which is unique and structure-preserving and improves cross-model and cross-modal robustness when instantiated with angular distance.

A SUPERB-Style Benchmark of Self-Supervised Speech Models for Audio Deepfake Detection

eess.AS · 2026-03-02 · unverdicted · novelty 7.0

Spoof-SUPERB benchmark shows large-scale discriminative SSL models such as XLS-R, UniSpeech-SAT, and WavLM Large outperform others in audio deepfake detection and maintain robustness under acoustic degradations.

Evaluating Speech Articulation Synthesis with Articulatory Phoneme Recognition

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

The paper introduces phoneme recognition using articulatory features as a proxy metric for evaluating articulatory speech synthesis quality from phonetic sequences.

Multimodal Diffusion Transformer with Memory Bank for Scalable Long-Duration Talking Video Generation

cs.CV · 2024-11-24 · unverdicted · novelty 6.0

LetsTalk combines a multimodal diffusion transformer, noise-regularized memory bank, deep compression autoencoder, and symbiotic/direct fusion schemes to achieve state-of-the-art quality and efficiency in long-duration talking video generation.

Vision Transformers Need Registers

cs.CV · 2023-09-28 · unverdicted · novelty 6.0

Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.

HighSync: High-Quality Lip Synchronization via Latent Diffusion Models

cs.CV · 2026-05-16 · unverdicted · novelty 5.0

HighSync is a diffusion-based lip synchronization system that operates natively at 512x512 resolution by eliminating data leakage to enforce genuine audio dependence and reports state-of-the-art results on quality and sync metrics.

Learning to Attend to Depression-Related Patterns: An Adaptive Cross-Modal Gating Network for Depression Detection

cs.SD · 2026-04-11 · unverdicted · novelty 4.0

An adaptive cross-modal gating network improves depression detection from speech by selectively weighting sparse relevant segments across acoustic and textual modalities.

EGI: A Multimodal Emotional AI Framework for Enhancing Scrum Master Real-time Self-Awareness

cs.AI · 2026-05-17 · unverdicted · novelty 3.0

EGI integrates four existing AI components for real-time multimodal emotion monitoring and feedback in simulated agile meetings, reporting 10% WER and improved self-awareness for Scrum Masters.

Meta-Learning and Meta-Reinforcement Learning -- Tracing the Path towards DeepMind's Adaptive Agent

cs.AI · 2026-02-23 · unverdicted · novelty 2.0

A survey provides a task-based formalization of meta-learning and meta-RL while chronicling algorithms that lead to DeepMind's Adaptive Agent.

citing papers explorer

Showing 10 of 10 citing papers.

SpurAudio: A Benchmark for Studying Shortcut Learning in Few-Shot Audio Classification cs.CV · 2026-05-13 · unverdicted · none · ref 52
SpurAudio benchmark shows state-of-the-art few-shot audio classifiers suffer large performance drops when background correlations are disrupted, even in large pretrained models.
The Indra Representation Hypothesis for Multimodal Alignment cs.CV · 2026-04-06 · unverdicted · none · ref 66
Unimodal model representations converge to a relational structure captured by the Indra representation via V-enriched Yoneda embedding, which is unique and structure-preserving and improves cross-model and cross-modal robustness when instantiated with angular distance.
A SUPERB-Style Benchmark of Self-Supervised Speech Models for Audio Deepfake Detection eess.AS · 2026-03-02 · unverdicted · none · ref 32
Spoof-SUPERB benchmark shows large-scale discriminative SSL models such as XLS-R, UniSpeech-SAT, and WavLM Large outperform others in audio deepfake detection and maintain robustness under acoustic degradations.
Evaluating Speech Articulation Synthesis with Articulatory Phoneme Recognition cs.CL · 2026-05-20 · unverdicted · none · ref 19
The paper introduces phoneme recognition using articulatory features as a proxy metric for evaluating articulatory speech synthesis quality from phonetic sequences.
Multimodal Diffusion Transformer with Memory Bank for Scalable Long-Duration Talking Video Generation cs.CV · 2024-11-24 · unverdicted · none · ref 54
LetsTalk combines a multimodal diffusion transformer, noise-regularized memory bank, deep compression autoencoder, and symbiotic/direct fusion schemes to achieve state-of-the-art quality and efficiency in long-duration talking video generation.
Vision Transformers Need Registers cs.CV · 2023-09-28 · unverdicted · none · ref 74
Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.
HighSync: High-Quality Lip Synchronization via Latent Diffusion Models cs.CV · 2026-05-16 · unverdicted · none · ref 16
HighSync is a diffusion-based lip synchronization system that operates natively at 512x512 resolution by eliminating data leakage to enforce genuine audio dependence and reports state-of-the-art results on quality and sync metrics.
Learning to Attend to Depression-Related Patterns: An Adaptive Cross-Modal Gating Network for Depression Detection cs.SD · 2026-04-11 · unverdicted · none · ref 12
An adaptive cross-modal gating network improves depression detection from speech by selectively weighting sparse relevant segments across acoustic and textual modalities.
EGI: A Multimodal Emotional AI Framework for Enhancing Scrum Master Real-time Self-Awareness cs.AI · 2026-05-17 · unverdicted · none · ref 28
EGI integrates four existing AI components for real-time multimodal emotion monitoring and feedback in simulated agile meetings, reporting 10% WER and improved self-awareness for Scrum Masters.
Meta-Learning and Meta-Reinforcement Learning -- Tracing the Path towards DeepMind's Adaptive Agent cs.AI · 2026-02-23 · unverdicted · none · ref 140
A survey provides a task-based formalization of meta-learning and meta-RL while chronicling algorithms that lead to DeepMind's Adaptive Agent.

preprint arXiv:1904.05862 , year=

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer