MuSViT is the first foundation vision model for sheet music, pre-trained on 9.7M IMSLP pages, that outperforms general encoders on recognition, detection, and classification tasks while encoding symbolic structure in its embeddings.
hub
Understanding dimen- sional collapse in contrastive self-supervised learning
14 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
UR-JEPA applies uniform rectifiability regularization via a smoothed Carleson square function to JEPA training, producing embeddings with 4-5 order PCA spectral drop at dimension 20-25 and lower seed variance than Gaussian regularization on Inet10, Galaxy10, and EuroSAT.
Contrastive Message Passing lets GNNs apply similarity-preserving transforms to positive edges and dissimilarity-inducing transforms to negative edges via soft positive semidefinite constraints on weights, yielding gains in low-label high-homophily regimes.
Empirical tests of 16 architectures on 153 subjects show camera rPPG signals contain no recoverable subject-specific pulse morphology, with all models exhibiting template collapse.
ShapeY is a benchmark dataset and nearest-neighbor protocol that measures shape-based recognition in vision models, revealing that even state-of-the-art networks fail to generalize consistently across 3D viewpoints and non-shape appearance changes.
Latent diffusion models exhibit geometric decoupling where curvature in out-of-distribution generation is misallocated to unstable semantic boundaries instead of image details, identifying geometric hotspots as the structural cause of editing instability.
UniCon unifies contrastive alignment across encoders and alignment types using kernels to enable exact closed-form updates instead of stochastic optimization.
SpecTran applies a spectral-aware transformer adapter with learnable position encoding to aggregate informative components across the full spectrum of LLM embeddings, yielding 9.17% average gains on sequential recommendation tasks.
AcuLa aligns audio models with medical language models via contrastive and self-supervised objectives on LLM-generated clinical reports, raising mean AUROC from 0.68 to 0.79 across 18 cardio-respiratory tasks.
LeJEPA derives an optimal isotropic Gaussian target for embeddings and enforces it via sketched regularization to deliver scalable, heuristics-free self-supervised pretraining with 79% ImageNet linear accuracy on ViT-H/14.
AGE applies adaptive masking via a learnable sampler in Transformer-based SSL to align graph and text embeddings, yielding higher accuracy on four GraphQA benchmarks for non-parametric GraphRAG.
IdEst estimates intrinsic dimension of SSL representations via dim_MST and reports strong correlation with linear probe accuracy across datasets and objectives.
HQ-JEPA combines JEPA-style predictive self-supervision with cross-modal alignment and a SWAP-test-based quantum fidelity loss for learning representations from paired remote sensing imagery, reporting competitive results on GeoBench tasks.
Diffusion models suffer representation degradation at high noise due to recoverability mismatch; ERD mitigates this by dynamic optimization reallocation, accelerating convergence across backbones.
citing papers explorer
-
MuSViT: A Foundation Vision Model for Sheet Music Representation
MuSViT is the first foundation vision model for sheet music, pre-trained on 9.7M IMSLP pages, that outperforms general encoders on recognition, detection, and classification tasks while encoding symbolic structure in its embeddings.
-
UR-JEPA: Uniform Rectifiability as a Regularizer for Joint-Embedding Predictive Architectures
UR-JEPA applies uniform rectifiability regularization via a smoothed Carleson square function to JEPA training, producing embeddings with 4-5 order PCA spectral drop at dimension 20-25 and lower seed variance than Gaussian regularization on Inet10, Galaxy10, and EuroSAT.
-
Learning over Positive and Negative Edges with Contrastive Message Passing
Contrastive Message Passing lets GNNs apply similarity-preserving transforms to positive edges and dissimilarity-inducing transforms to negative edges via soft positive semidefinite constraints on weights, yielding gains in low-label high-homophily regimes.
-
Template Collapse and Information-Theoretic Limits in Camera rPPG Pulse Morphology Restoration
Empirical tests of 16 architectures on 153 subjects show camera rPPG signals contain no recoverable subject-specific pulse morphology, with all models exhibiting template collapse.
-
ShapeY: A Principled Framework for Measuring Shape Recognition Capacity via Nearest-Neighbor Matching
ShapeY is a benchmark dataset and nearest-neighbor protocol that measures shape-based recognition in vision models, revealing that even state-of-the-art networks fail to generalize consistently across 3D viewpoints and non-shape appearance changes.
-
Geometric Decoupling: Diagnosing the Structural Instability of Latent
Latent diffusion models exhibit geometric decoupling where curvature in out-of-distribution generation is misallocated to unstable semantic boundaries instead of image details, identifying geometric hotspots as the structural cause of editing instability.
-
UniCon: Unified Framework for Efficient Contrastive Alignment via Kernels
UniCon unifies contrastive alignment across encoders and alignment types using kernels to enable exact closed-form updates instead of stochastic optimization.
-
SpecTran: Spectral-Aware Transformer-based Adapter for LLM-Enhanced Sequential Recommendation
SpecTran applies a spectral-aware transformer adapter with learnable position encoding to aggregate informative components across the full spectrum of LLM embeddings, yielding 9.17% average gains on sequential recommendation tasks.
-
Language Models as Semantic Teachers: Post-Training Alignment for Medical Audio Understanding
AcuLa aligns audio models with medical language models via contrastive and self-supervised objectives on LLM-generated clinical reports, raising mean AUROC from 0.68 to 0.79 across 18 cardio-respiratory tasks.
-
AGE: Adaptive-masking for Graph Embedding in Graph Retrieval-Augmented Generation
AGE applies adaptive masking via a learnable sampler in Transformer-based SSL to align graph and text embeddings, yielding higher accuracy on four GraphQA benchmarks for non-parametric GraphRAG.
-
IdEst: Assessing Self-Supervised Learning Representations via Intrinsic Dimension
IdEst estimates intrinsic dimension of SSL representations via dim_MST and reports strong correlation with linear probe accuracy across datasets and objectives.
-
HQ-JEPA: Hybrid Quantum Joint-Embedding Predictive Architecture for Cross-Modal Remote Sensing Representation Learning
HQ-JEPA combines JEPA-style predictive self-supervision with cross-modal alignment and a SWAP-test-based quantum fidelity loss for learning representations from paired remote sensing imagery, reporting competitive results on GeoBench tasks.
-
Elucidating Representation Degradation Problem in Diffusion Model Training
Diffusion models suffer representation degradation at high noise due to recoverability mismatch; ERD mitigates this by dynamic optimization reallocation, accelerating convergence across backbones.