GridS is a plug-and-play differentiable module for geometry-aware visual token resampling in VLA models that achieves under 10% token retention and 76% FLOPs reduction with no success-rate loss.
Transactions on Machine Learning Research , issn=
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 9roles
method 1polarities
use method 1representative citing papers
Variable codebook sizes that increase along the sequence in visual tokenizers reduce generation FID scores significantly for autoregressive models on ImageNet.
SemiPrune uses a small labeled subset and semi-supervised pseudo-labeling to enable supervised dataset pruning methods, achieving state-of-the-art results on domain-specific, image-corrupted, and long-tailed datasets.
Ada-Diffuser is a causal diffusion model that jointly learns observed interaction structure and underlying latent dynamics from minimal observations for adaptive planning and policy learning.
A compositional diffusion world model integrates three specialized memory experts via contrastive product-of-experts to improve temporal consistency, past recall, and navigation while scaling to long contexts without quadratic costs.
Grounded Correspondence maintains temporal consistency via deterministic bipartite matching on frozen backbone features instead of learned predictors, achieving competitive results on MOVi and YouTube-VIS with zero learnable temporal parameters.
SSL clustering is derived as KL-divergence optimization where a teacher-distribution constraint normalizes via inverse cluster priors and simplifies to batch centering by Jensen's inequality.
A decoupling strategy optimizes object slots for holistic class identity during training and composes them at inference to achieve better generalization to unseen concepts in continual few-shot settings.
APEX is an assumption-free image quality metric using Sliced Wasserstein Distance on CLIP and DINOv2 embeddings that claims superior robustness to degradations and cross-dataset stability.
citing papers explorer
-
See What Matters: Differentiable Grid Sample Pruning for Generalizable Vision-Language-Action Model
GridS is a plug-and-play differentiable module for geometry-aware visual token resampling in VLA models that achieves under 10% token retention and 76% FLOPs reduction with no success-rate loss.
-
Taming the Entropy Cliff: Variable Codebook Size Quantization for Autoregressive Visual Generation
Variable codebook sizes that increase along the sequence in visual tokenizers reduce generation FID scores significantly for autoregressive models on ImageNet.
-
Label-Efficient Dataset Pruning via Semi-Supervised Pseudo-Labeling
SemiPrune uses a small labeled subset and semi-supervised pseudo-labeling to enable supervised dataset pruning methods, achieving state-of-the-art results on domain-specific, image-corrupted, and long-tailed datasets.
-
Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making
Ada-Diffuser is a causal diffusion model that jointly learns observed interaction structure and underlying latent dynamics from minimal observations for adaptive planning and policy learning.
-
Composition of Memory Experts for Diffusion World Models
A compositional diffusion world model integrates three specialized memory experts via contrastive product-of-experts to improve temporal consistency, past recall, and navigation while scaling to long contexts without quadratic costs.
-
Rethinking Temporal Consistency in Video Object-Centric Learning: From Prediction to Correspondence
Grounded Correspondence maintains temporal consistency via deterministic bipartite matching on frozen backbone features instead of learned predictors, achieving competitive results on MOVi and YouTube-VIS with zero learnable temporal parameters.
-
Information theoretic underpinning of self-supervised learning by clustering
SSL clustering is derived as KL-divergence optimization where a teacher-distribution constraint normalizes via inverse cluster priors and simplifies to batch centering by Jensen's inequality.
-
Unlocking Compositional Generalization in Continual Few-Shot Learning
A decoupling strategy optimizes object slots for holistic class identity during training and composes them at inference to achieve better generalization to unseen concepts in continual few-shot settings.
-
APEX: Assumption-free Projection-based Embedding eXamination Metric for Image Quality Assessment
APEX is an assumption-free image quality metric using Sliced Wasserstein Distance on CLIP and DINOv2 embeddings that claims superior robustness to degradations and cross-dataset stability.