hub Tool reference

Neural discrete representation learning.Advances in neural information processing systems, 30

Aaron Van Den Oord, Oriol Vinyals, et al · 2017

Tool reference. 83% of classified Pith citations use this work as a method, library, or software dependency, not as a substantive claim.

18 Pith papers citing it

Method reference 83% of classified citations

browse 18 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

method 5 background 1

citation-polarity summary

use method 5 background 1

representative citing papers

AffectCodec: Emotion-Preserving Neural Speech Codec with Block-Diagonal Residual FSQ

cs.SD · 2026-05-22 · unverdicted · novelty 7.0

AffectCodec applies block-diagonal projections in residual FSQ to explicitly allocate bits to emotion and acoustic subspaces, combined with emotion conditioning, yielding better emotion preservation at low bitrates with competitive acoustic quality.

Understanding Multimodal Failure in Action-Chunking Behavioral Cloning

cs.LG · 2026-05-21 · unverdicted · novelty 7.0

The paper identifies distinct failure mechanisms: excessive posterior-prior regularization erases mode information in latent policies, while smooth base-to-action maps limit mode coverage in generative policies.

Divide et Calibra: Multiclass Local Calibration via Vector Quantization

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

Vector quantization induces a structured partition of the representation space for composing heterogeneous multiclass calibration maps via shared codeword-dependent Dirichlet factors.

Token by Token, Compromised: Backdoor Vulnerabilities in Unified Autoregressive Models

cs.CR · 2026-05-19 · conditional · novelty 7.0

ToBAC is the first backdoor attack on unified autoregressive models, using data or model poisoning to make triggers elicit cross-modal malicious behavior in text and image generation.

RotVLA: Rotational Latent Action for Vision-Language-Action Model

cs.RO · 2026-05-13 · unverdicted · novelty 7.0

RotVLA models latent actions as continuous SO(n) rotations with triplet-frame supervision and flow-matching to reach 98.2% success on LIBERO and 89.6%/88.5% on RoboTwin2.0 using a 1.7B-parameter model.

Spatial Gram Alignment for Ultra-High-Resolution Image Synthesis

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

Spatial Gram Alignment aligns internal self-similarities of LDM features with foundation priors to reconcile global structure and fine details in ultra-high-resolution text-to-image synthesis.

CPC-VAR:Continual Personalized and Compositional Generation in Visual Autoregressive Models

cs.CV · 2026-05-19 · unverdicted · novelty 6.0

CPC-VAR adds Gradient-based Concept Neuron Selection for continual single-concept learning and a context-aware multi-branch composition strategy to reduce forgetting and entanglement in VAR-based personalized image generation.

InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation

cs.CV · 2026-05-14 · conditional · novelty 6.0

InsightTok improves text and face fidelity in discrete image tokenization via content-aware perceptual losses, with gains transferring to autoregressive generation.

ALAM: Algebraically Consistent Latent Action Model for Vision-Language-Action Models

cs.RO · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

ALAM introduces algebraic consistency regularization on latent action transitions from videos, raising VLA success rates from 47.9% to 85.0% on MetaWorld MT50 and 94.1% to 98.1% on LIBERO.

Yeti: A compact protein structure tokenizer for reconstruction and multi-modal generation

q-bio.BM · 2026-05-11 · unverdicted · novelty 6.0

Yeti is a compact tokenizer for protein structures that delivers strong codebook use, token diversity, and reconstruction while enabling from-scratch multimodal generation of plausible sequences and structures with 10x fewer parameters than ESM3.

cs.AI · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

SDFlow learns a global transport map via similarity-driven flow matching in VQ latent space, using low-rank manifold decomposition and a categorical posterior to handle discreteness, yielding SOTA long-horizon performance and inference speedups.

CapsID: Soft-Routed Variable-Length Semantic IDs for Generative Recommendation

cs.IR · 2026-05-06 · unverdicted · novelty 6.0

CapsID uses probabilistic capsule routing and confidence-based termination to generate variable-length semantic IDs, improving recall by 9.6% over strong baselines with half the latency of dual-representation systems.

FaithfulFaces: Pose-Faithful Facial Identity Preservation for Text-to-Video Generation

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

FaithfulFaces introduces a pose-faithful identity aligner with a shared dictionary and invariance constraint to maintain facial identity in text-to-video generation under large pose changes and occlusions.

Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning

cs.RO · 2026-02-09 · unverdicted · novelty 6.0

R&B-EnCoRe uses self-supervised importance-weighted variational inference to distill action-predictive reasoning datasets that improve VLA performance on manipulation, navigation, and driving tasks without external verifiers.

Negative Binomial Variational Autoencoders for Overdispersed Latent Modeling

cs.LG · 2025-08-07 · unverdicted · novelty 6.0

NegBio-VAE introduces negative binomial latents with dispersion to handle overdispersion in discrete VAE models, yielding better reconstruction, generation, and downstream representations than Poisson VAE baselines.

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

cs.CV · 2023-08-13 · unverdicted · novelty 6.0

IP-Adapter adds effective image prompting to text-to-image diffusion models using a lightweight decoupled cross-attention adapter that works alongside text prompts and other controls.

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

cs.RO · 2025-07-02 · unverdicted · novelty 5.0

The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.

Atom-level Protein Representation Learning Improves Protein Structure Prediction

q-bio.BM · 2026-05-21 · 2 refs

citing papers explorer

Showing 18 of 18 citing papers.

AffectCodec: Emotion-Preserving Neural Speech Codec with Block-Diagonal Residual FSQ cs.SD · 2026-05-22 · unverdicted · none · ref 22
AffectCodec applies block-diagonal projections in residual FSQ to explicitly allocate bits to emotion and acoustic subspaces, combined with emotion conditioning, yielding better emotion preservation at low bitrates with competitive acoustic quality.
Understanding Multimodal Failure in Action-Chunking Behavioral Cloning cs.LG · 2026-05-21 · unverdicted · none · ref 30
The paper identifies distinct failure mechanisms: excessive posterior-prior regularization erases mode information in latent policies, while smooth base-to-action maps limit mode coverage in generative policies.
Divide et Calibra: Multiclass Local Calibration via Vector Quantization cs.LG · 2026-05-20 · unverdicted · none · ref 19
Vector quantization induces a structured partition of the representation space for composing heterogeneous multiclass calibration maps via shared codeword-dependent Dirichlet factors.
Token by Token, Compromised: Backdoor Vulnerabilities in Unified Autoregressive Models cs.CR · 2026-05-19 · conditional · none · ref 69
ToBAC is the first backdoor attack on unified autoregressive models, using data or model poisoning to make triggers elicit cross-modal malicious behavior in text and image generation.
RotVLA: Rotational Latent Action for Vision-Language-Action Model cs.RO · 2026-05-13 · unverdicted · none · ref 9
RotVLA models latent actions as continuous SO(n) rotations with triplet-frame supervision and flow-matching to reach 98.2% success on LIBERO and 89.6%/88.5% on RoboTwin2.0 using a 1.7B-parameter model.
Spatial Gram Alignment for Ultra-High-Resolution Image Synthesis cs.CV · 2026-05-20 · unverdicted · none · ref 22
Spatial Gram Alignment aligns internal self-similarities of LDM features with foundation priors to reconcile global structure and fine details in ultra-high-resolution text-to-image synthesis.
CPC-VAR:Continual Personalized and Compositional Generation in Visual Autoregressive Models cs.CV · 2026-05-19 · unverdicted · none · ref 39
CPC-VAR adds Gradient-based Concept Neuron Selection for continual single-concept learning and a context-aware multi-branch composition strategy to reduce forgetting and entanglement in VAR-based personalized image generation.
InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation cs.CV · 2026-05-14 · conditional · none · ref 43
InsightTok improves text and face fidelity in discrete image tokenization via content-aware perceptual losses, with gains transferring to autoregressive generation.
ALAM: Algebraically Consistent Latent Action Model for Vision-Language-Action Models cs.RO · 2026-05-11 · unverdicted · none · ref 47 · 2 links
ALAM introduces algebraic consistency regularization on latent action transitions from videos, raising VLA success rates from 47.9% to 85.0% on MetaWorld MT50 and 94.1% to 98.1% on LIBERO.
Yeti: A compact protein structure tokenizer for reconstruction and multi-modal generation q-bio.BM · 2026-05-11 · unverdicted · none · ref 12
Yeti is a compact tokenizer for protein structures that delivers strong codebook use, token diversity, and reconstruction while enabling from-scratch multimodal generation of plausible sequences and structures with 10x fewer parameters than ESM3.
SDFlow: Similarity-Driven Flow Matching for Time Series Generation cs.AI · 2026-05-07 · unverdicted · none · ref 25 · 2 links
SDFlow learns a global transport map via similarity-driven flow matching in VQ latent space, using low-rank manifold decomposition and a categorical posterior to handle discreteness, yielding SOTA long-horizon performance and inference speedups.
CapsID: Soft-Routed Variable-Length Semantic IDs for Generative Recommendation cs.IR · 2026-05-06 · unverdicted · none · ref 27
CapsID uses probabilistic capsule routing and confidence-based termination to generate variable-length semantic IDs, improving recall by 9.6% over strong baselines with half the latency of dual-representation systems.
FaithfulFaces: Pose-Faithful Facial Identity Preservation for Text-to-Video Generation cs.CV · 2026-05-06 · unverdicted · none · ref 31
FaithfulFaces introduces a pose-faithful identity aligner with a shared dictionary and invariance constraint to maintain facial identity in text-to-video generation under large pose changes and occlusions.
Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning cs.RO · 2026-02-09 · unverdicted · none · ref 104
R&B-EnCoRe uses self-supervised importance-weighted variational inference to distill action-predictive reasoning datasets that improve VLA performance on manipulation, navigation, and driving tasks without external verifiers.
Negative Binomial Variational Autoencoders for Overdispersed Latent Modeling cs.LG · 2025-08-07 · unverdicted · none · ref 44
NegBio-VAE introduces negative binomial latents with dispersion to handle overdispersion in discrete VAE models, yielding better reconstruction, generation, and downstream representations than Poisson VAE baselines.
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models cs.CV · 2023-08-13 · unverdicted · none · ref 17
IP-Adapter adds effective image prompting to text-to-image diffusion models using a lightweight decoupled cross-attention adapter that works alongside text prompts and other controls.
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective cs.RO · 2025-07-02 · unverdicted · none · ref 237
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
Atom-level Protein Representation Learning Improves Protein Structure Prediction q-bio.BM · 2026-05-21 · unreviewed · ref 13 · 2 links

Neural discrete representation learning.Advances in neural information processing systems, 30

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer