Neural discrete representation learning

Aaron Van Den Oord, Oriol Vinyals, et al · 2017

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

browse 7 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Score-based Membership Inference on Diffusion Models

cs.LG · 2025-09-29 · unverdicted · novelty 7.0

Presents SimA, a score-based single-query membership inference attack for diffusion models and LDMs that uses denoiser output norm to reveal training set proximity and outperforms multi-query baselines on eight datasets.

AudioMoG: Guiding Audio Generation with Mixture-of-Guidance

cs.SD · 2025-09-28 · unverdicted · novelty 7.0

AudioMoG is a mixture-of-guidance sampling technique that combines CFG and AG signals to outperform single-guidance baselines in text-to-audio generation at equivalent speed.

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

cs.CV · 2022-05-23 · accept · novelty 7.0

Imagen achieves state-of-the-art photorealistic text-to-image generation by scaling a text-only pretrained T5 language model within a diffusion framework, reaching FID 7.27 on COCO without training on it.

Incomplete Multi-View Multi-Label Classification via Shared Codebook and Fused-Teacher Self-Distillation

cs.CV · 2026-04-05 · unverdicted · novelty 6.0

A shared codebook with cross-view reconstruction plus fused-teacher self-distillation improves classification accuracy on incomplete multi-view multi-label data.

StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs

cs.CL · 2025-09-26 · unverdicted · novelty 6.0

StableToken introduces a multi-branch architecture with bit-wise voting to create noise-robust semantic speech tokens, achieving lower Unit Edit Distance and better SpeechLLM robustness than prior single-path tokenizers.

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

cs.CV · 2022-06-22 · unverdicted · novelty 6.0

Scaling an autoregressive Transformer to 20B parameters for text-to-image generation using image token sequences achieves new SOTA zero-shot FID of 7.23 and fine-tuned FID of 3.22 on MS-COCO.

XR-1: Towards Versatile Vision-Language-Action Models via Learning Unified Vision-Motion Representations

cs.RO · 2025-11-04 · unverdicted · novelty 5.0

XR-1 introduces Unified Vision-Motion Codes learned by dual-branch VQ-VAE and applies them in a three-stage training pipeline to outperform prior VLA models on 120+ real-world manipulation tasks across six robot embodiments.

citing papers explorer

Showing 7 of 7 citing papers.

Score-based Membership Inference on Diffusion Models cs.LG · 2025-09-29 · unverdicted · none · ref 44
Presents SimA, a score-based single-query membership inference attack for diffusion models and LDMs that uses denoiser output norm to reveal training set proximity and outperforms multi-query baselines on eight datasets.
AudioMoG: Guiding Audio Generation with Mixture-of-Guidance cs.SD · 2025-09-28 · unverdicted · none · ref 67
AudioMoG is a mixture-of-guidance sampling technique that combines CFG and AG signals to outperform single-guidance baselines in text-to-audio generation at equivalent speed.
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding cs.CV · 2022-05-23 · accept · none · ref 71
Imagen achieves state-of-the-art photorealistic text-to-image generation by scaling a text-only pretrained T5 language model within a diffusion framework, reaching FID 7.27 on COCO without training on it.
Incomplete Multi-View Multi-Label Classification via Shared Codebook and Fused-Teacher Self-Distillation cs.CV · 2026-04-05 · unverdicted · none · ref 20
A shared codebook with cross-view reconstruction plus fused-teacher self-distillation improves classification accuracy on incomplete multi-view multi-label data.
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs cs.CL · 2025-09-26 · unverdicted · none · ref 69
StableToken introduces a multi-branch architecture with bit-wise voting to create noise-robust semantic speech tokens, achieving lower Unit Edit Distance and better SpeechLLM robustness than prior single-path tokenizers.
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation cs.CV · 2022-06-22 · unverdicted · none · ref 6
Scaling an autoregressive Transformer to 20B parameters for text-to-image generation using image token sequences achieves new SOTA zero-shot FID of 7.23 and fine-tuned FID of 3.22 on MS-COCO.
XR-1: Towards Versatile Vision-Language-Action Models via Learning Unified Vision-Motion Representations cs.RO · 2025-11-04 · unverdicted · none · ref 84
XR-1 introduces Unified Vision-Motion Codes learned by dual-branch VQ-VAE and applies them in a three-stage training pipeline to outperform prior VLA models on 120+ real-world manipulation tasks across six robot embodiments.

Neural discrete representation learning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer