RICA replaces ICA's global generative model with local Riemannian geometry, introducing a disentanglement tensor based on the Hessian of the log-likelihood and Ricci curvature to measure pointwise disentanglement, which recovers sources across manifolds in controlled tests.
hub Mixed citations
author Dong, W
Mixed citation behavior. Most common role is background (53%).
hub tools
citation-role summary
citation-polarity summary
representative citing papers
A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.
Normalizing flows are constructed by learning the velocity of a stochastic interpolant via a quadratic loss derived from its probability current, yielding an efficient ODE-based alternative to diffusion models.
HyperDn is a configuration-conditioned predictor that transfers oracle supervision across denoising paradigms to achieve near-oracle hyperparameter prediction with few or zero target labels.
SDM is a new staged gradient attack that reconstructs the adversarial objective around probability differences and reports stronger performance than prior methods like APGD.
RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.
Formalizes Reasoning Portability (RP) and proposes RDB-CL to modulate per-sample KL regularization in RLVR for MLLM continual learning, achieving +12.0% Last accuracy over vanilla RLVR baseline by preserving reusable reasoning on high-RP samples.
PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.
LLQR+SAM pairs a slow learned geometry preconditioner with fast SAM perturbations to amplify escape from locally sharp 'potholes' while stabilizing flat basins, producing consistent gains over SAM and LLQR alone.
Human face perception aligns with neural networks trained on inverse-generative and naturalistic discriminative tasks, as these best predict human dissimilarity judgments on controversial and random face pairs.
MMM-Bench supplies 5,990 multi-modal documents from 12 commercial domains annotated along a 5-level taxonomy to test document classification under realistic business conditions.
Urban-ImageNet is a 2-million-image multi-modal dataset with HUSIC 10-class taxonomy enabling benchmarks for urban scene classification, cross-modal retrieval, and instance segmentation.
HypCBM reformulates concept activations as geometric containment in hyperbolic space to produce sparse, hierarchy-aware signals that match Euclidean models trained on 20 times more data.
A modified DCGAN with an auxiliary discriminator using the membrane factor generates stable, previously unseen funicular shells optimized for pure compression in three dimensions.
A diffusion-based pipeline creates a 27M-annotation dataset of object placements that outperforms human annotations and baselines on image editing tasks, then distills it into a fast model.
LOGGIA is a delay-aware graph neural routing algorithm using pre-training and RL that outperforms shortest-path and other neural methods in realistic network simulations.
XR Blocks supplies an LLM-optimized Reality Model and Vibe Coding XR workflow that converts high-level prompts into working physics-aware XR applications with high one-shot success.
Matched benchmarking reveals FID misleads in few-step regimes under CFG, prompting CLIP-scaled and PickScore-scaled FID and IS variants for better semantic evaluation of one-step image generators.
DREAM introduces Masking Warmup and Semantically Aligned Decoding to let a single encoder handle both contrastive alignment and masked generation, yielding gains over CLIP and FLUID on understanding and generation benchmarks.
MobileMold provides 4941 smartphone microscopy images and shows deep learning models reach 99.5% accuracy on mold detection and food classification tasks.
LoRA gradient descent converges to a stationary point at rate O(1/log T).
F2D2 jointly distills sampling and likelihood computation in flow-based models by adding a divergence head to a few-step flow map, achieving accurate log-likelihoods at 2-10 NFEs while preserving sample quality.
Multi-Level Optimal Transport (MOT) jointly infers soft layer couplings and neuron transport plans to produce global alignment scores and structured hierarchical correspondences between networks of varying depths.
Inf-SSM constrains the infinite-horizon evolution of SSMs via Grassmannian geometry and an efficient O(n^2) Sylvester solver to enable exemplar-free continual learning with reduced forgetting.
citing papers explorer
-
Zero-shot World Models Are Developmentally Efficient Learners
A zero-shot visual world model trained on one child's experience achieves broad competence on physical understanding benchmarks while matching developmental behavioral patterns.
-
StomaD2: An All-in-One System for Intelligent Stomatal Phenotype Analysis via Diffusion-Based Restoration Detection Network
StomaD2 integrates diffusion-based image restoration with a specialized rotated detection network to achieve high-accuracy stomatal phenotyping across more than 130 plant species.