ToLL pretrains 3D scene graph generators via anchor-conditioned topological layout recovery and asymmetric structural distillation to learn predicate constraints rather than geometric interpolation shortcuts.
arXiv preprint arXiv:2505.12477 , year=
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
DySIB recovers a two-dimensional representation matching the phase space of a physical pendulum from high-dimensional video data by maximizing predictive mutual information in latent space.
LeJEPA derives an optimal isotropic Gaussian target for embeddings and enforces it via sketched regularization to deliver scalable, heuristics-free self-supervised pretraining with 79% ImageNet linear accuracy on ViT-H/14.
ST-STORM introduces a dual-branch SSL framework that disentangles semantic content from stylistic appearance using gated latent streams, JEPA for content invariance, and adversarial constraints for style capture.
citing papers explorer
-
ToLL: Topological Layout Learning with Asymmetric Cross-View Structural Distillation for 3D Scene Graph Generation Pretraining
ToLL pretrains 3D scene graph generators via anchor-conditioned topological layout recovery and asymmetric structural distillation to learn predicate constraints rather than geometric interpolation shortcuts.
-
Information bottleneck for learning the phase space of dynamics from high-dimensional experimental data
DySIB recovers a two-dimensional representation matching the phase space of a physical pendulum from high-dimensional video data by maximizing predictive mutual information in latent space.
-
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics
LeJEPA derives an optimal isotropic Gaussian target for embeddings and enforces it via sketched regularization to deliver scalable, heuristics-free self-supervised pretraining with 79% ImageNet linear accuracy on ViT-H/14.
-
Stylistic-STORM (ST-STORM) : Perceiving the Semantic Nature of Appearance
ST-STORM introduces a dual-branch SSL framework that disentangles semantic content from stylistic appearance using gated latent streams, JEPA for content invariance, and adversarial constraints for style capture.