RICA replaces ICA's global generative model with local Riemannian geometry, introducing a disentanglement tensor based on the Hessian of the log-likelihood and Ricci curvature to measure pointwise disentanglement, which recovers sources across manifolds in controlled tests.
super hub Mixed citations
author Dong, W
Mixed citation behavior. Most common role is background (53%).
hub tools
citation-role summary
citation-polarity summary
claims ledger
- dataset ratio as baseline first@τdivided by ours first@τ. A speedup ratio greater than 1.0×means ours reaches the same target earlier with fewer epochs or steps. For higher-is-better metrics (Top-1, AP50), first@τis the first epoch with metric at or aboveτ. For lower-is-better metrics (FID), first@τis the first step at or belowτ. Gate and Hyperparameter Selection.For ImageNet classification [7], we useτ= 65for ResNet-50 [14] andτ= 50for ViT-S/16 [8]. For CIFAR early-stage classification [26], we use fix
- dataset ϕ(cchild,c parent)< η text(∥˜ cparent∥)·ω(˜ cparent).(8) This allows users to prune entire branches of spurious concepts with a single interaction, substantially reducing the number of interventions required to correct a prediction. 4 Experiments We evaluate HypCBM across three domains:CIFAR-100[ 20] for general object classification, SUN397[ 51] for (hierarchical) scene understanding, andImageNet[ 6] to assess scalability to real- world complexity. Additional results onCUB-200[ 50] are provided
- background URL https://www.datanami.com/2020/07/06/ data-prep-still-dominates-data-scientists-time-survey-finds/. [8] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248-255, 2009. doi: 10.1109/CVPR.2009.5206848. [9] Robert Dorfman. A formula for the gini coefficient.The Review of Economics and Statistics, 61 (1):146-49, 1979. URL https://EconPapers.repec.org
- dataset The inverse-rendering model, invRend-BFM, was trained to infer the BFM generative parameters of a 2D face image, including identity-related shape and texture latents as well as expres- sion, pose, light direction, and light intensity. The object-categorization model, objCat-ImageNet, was trained to classify natural images into ImageNet object categories [46]. Details of the training objective, architectural modi- fications, and training dataset for each model are provided in Methods 4.1. For cop
- background by shared tasks, common data, and open leaderboards, was the engine behind transformative progress 2 Figure 1:MC 2 pipeline.A low-budget Monte Carlo WoS estimate is corrected in a single forward pass by a learned operator, yielding an improved solution for the PDE. in NLP and computer vision, where benchmarks like GLUE [35], SuperGLUE [36], and ImageNet [8] created a culture of head-to-head comparison on identical inputs. PDE solving has no analog. Existing benchmarks each occupy narrow regimes:
- background Instance Segmentation.Cityscapes [ 6], ADE20K [42], LVIS [12], and Mapillary Vistas [28] cover outdoor driving and general scenes but apply no domain-specific vocabulary tailored to commercial spaces-the escalators, retail shelves, display cases, hotel beds, and food presentations that define the majority of Urban-ImageNet's images. Scaling Behaviour.ImageNet [ 7] established scale as a performance driver; GPT-3 [4] and scaling laws [18] showed predictable growth; LAION-5B [35] demonstrated bill
authors
co-cited works
representative citing papers
A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.
Normalizing flows are constructed by learning the velocity of a stochastic interpolant via a quadratic loss derived from its probability current, yielding an efficient ODE-based alternative to diffusion models.
OncoTraj releases a harmonized 813-patient dataset with audited splits for three tasks on osimertinib resistance, showing single-timepoint NGS features yield no model above chance while recovering a TP53 association.
iSAGE achieves near-dense mIoU performance in remote sensing semantic segmentation using iterative expert clicks on confident model errors with an error-weighted loss, using only 0.011-0.04% of pixels.
Smaller self-supervised ViTs localize objects better via attention than larger ViTs, enabling A² to decouple localization from feature extraction for competitive performance on distribution-shifted benchmarks.
STROP learns variable-length discrete visual programs for images by training a length head against frozen DINOv3 features in a four-phase curriculum while bypassing pixel reconstruction.
HyperDn is a configuration-conditioned predictor that transfers oracle supervision across denoising paradigms to achieve near-oracle hyperparameter prediction with few or zero target labels.
SDM is a new staged gradient attack that reconstructs the adversarial objective around probability differences and reports stronger performance than prior methods like APGD.
RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.
Formalizes Reasoning Portability (RP) and proposes RDB-CL to modulate per-sample KL regularization in RLVR for MLLM continual learning, achieving +12.0% Last accuracy over vanilla RLVR baseline by preserving reusable reasoning on high-RP samples.
PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.
LLQR+SAM pairs a slow learned geometry preconditioner with fast SAM perturbations to amplify escape from locally sharp 'potholes' while stabilizing flat basins, producing consistent gains over SAM and LLQR alone.
Human face perception aligns with neural networks trained on inverse-generative and naturalistic discriminative tasks, as these best predict human dissimilarity judgments on controversial and random face pairs.
MMM-Bench supplies 5,990 multi-modal documents from 12 commercial domains annotated along a 5-level taxonomy to test document classification under realistic business conditions.
Urban-ImageNet is a 2-million-image multi-modal dataset with HUSIC 10-class taxonomy enabling benchmarks for urban scene classification, cross-modal retrieval, and instance segmentation.
HypCBM reformulates concept activations as geometric containment in hyperbolic space to produce sparse, hierarchy-aware signals that match Euclidean models trained on 20 times more data.
A modified DCGAN with an auxiliary discriminator using the membrane factor generates stable, previously unseen funicular shells optimized for pure compression in three dimensions.
A diffusion-based pipeline creates a 27M-annotation dataset of object placements that outperforms human annotations and baselines on image editing tasks, then distills it into a fast model.
LOGGIA is a delay-aware graph neural routing algorithm using pre-training and RL that outperforms shortest-path and other neural methods in realistic network simulations.
XR Blocks supplies an LLM-optimized Reality Model and Vibe Coding XR workflow that converts high-level prompts into working physics-aware XR applications with high one-shot success.
Matched benchmarking reveals FID misleads in few-step regimes under CFG, prompting CLIP-scaled and PickScore-scaled FID and IS variants for better semantic evaluation of one-step image generators.
DREAM introduces Masking Warmup and Semantically Aligned Decoding to let a single encoder handle both contrastive alignment and masked generation, yielding gains over CLIP and FLUID on understanding and generation benchmarks.
MobileMold provides 4941 smartphone microscopy images and shows deep learning models reach 99.5% accuracy on mold detection and food classification tasks.
citing papers explorer
-
Disentanglement Beyond Generative Models with Riemannian ICA
RICA replaces ICA's global generative model with local Riemannian geometry, introducing a disentanglement tensor based on the Hessian of the log-likelihood and Ricci curvature to measure pointwise disentanglement, which recovers sources across manifolds in controlled tests.
-
STRABLE: Benchmarking Tabular Machine Learning with Strings
A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.
-
Building Normalizing Flows with Stochastic Interpolants
Normalizing flows are constructed by learning the velocity of a stochastic interpolant via a quadratic loss derived from its probability current, yielding an efficient ODE-based alternative to diffusion models.
-
OncoTraj: a public benchmark for longitudinal resistance prediction in EGFR-mutant non-small-cell lung cancer on osimertinib
OncoTraj releases a harmonized 813-patient dataset with audited splits for three tasks on osimertinib resistance, showing single-timepoint NGS features yield no model above chance while recovering a TP53 association.
-
iSAGE: A Human-in-the-Loop Framework for Remote Sensing Semantic Segmentation via Sparse Point Supervision
iSAGE achieves near-dense mIoU performance in remote sensing semantic segmentation using iterative expert clicks on confident model errors with an error-weighted loss, using only 0.011-0.04% of pixels.
-
$A^2$: Smaller Self-Supervised ViTs Localize Better than Larger Ones
Smaller self-supervised ViTs localize objects better via attention than larger ViTs, enabling A² to decouple localization from feature extraction for competitive performance on distribution-shifted benchmarks.
-
Structure over Pixels: Learning Variable-Length Visual Programs
STROP learns variable-length discrete visual programs for images by training a length head against frozen DINOv3 features in a four-phase curriculum while bypassing pixel reconstruction.
-
Oracle Supervision Transfers for Hyperparameter Prediction in Model-Based Image Denoising
HyperDn is a configuration-conditioned predictor that transfers oracle supervision across denoising paradigms to achieve near-oracle hyperparameter prediction with few or zero target labels.
-
SDM: A Powerful Tool for Evaluating Model Robustness
SDM is a new staged gradient attack that reconstructs the adversarial objective around probability differences and reports stronger performance than prior methods like APGD.
-
Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation
RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.
-
Reasoning Portability: Guiding Continual Learning for MLLMs in the RLVR Era
Formalizes Reasoning Portability (RP) and proposes RDB-CL to modulate per-sample KL regularization in RLVR for MLLM continual learning, achieving +12.0% Last accuracy over vanilla RLVR baseline by preserving reusable reasoning on high-RP samples.
-
PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media
PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.
-
Navigating Potholes with Geometry-Aware Sharpness Minimization
LLQR+SAM pairs a slow learned geometry preconditioner with fast SAM perturbations to amplify escape from locally sharp 'potholes' while stabilizing flat basins, producing consistent gains over SAM and LLQR alone.
-
Human face perception reflects inverse-generative and naturalistic discriminative objectives
Human face perception aligns with neural networks trained on inverse-generative and naturalistic discriminative tasks, as these best predict human dissimilarity judgments on controversial and random face pairs.
-
Multi-domain Multi-modal Document Classification Benchmark with a Multi-level Taxonomy
MMM-Bench supplies 5,990 multi-modal documents from 12 commercial domains annotated along a 5-level taxonomy to test document classification under realistic business conditions.
-
Urban-ImageNet: A Large-Scale Multi-Modal Dataset and Evaluation Framework for Urban Space Perception
Urban-ImageNet is a 2-million-image multi-modal dataset with HUSIC 10-class taxonomy enabling benchmarks for urban scene classification, cross-modal retrieval, and instance segmentation.
-
Hyperbolic Concept Bottleneck Models
HypCBM reformulates concept activations as geometric containment in hyperbolic space to produce sparse, hierarchy-aware signals that match Euclidean models trained on 20 times more data.
-
Physics-informed, Generative Adversarial Design of Funicular Shells
A modified DCGAN with an auxiliary discriminator using the membrane factor generates stable, previously unseen funicular shells optimized for pure compression in three dimensions.
-
HiddenObjects: Scalable Diffusion-Distilled Spatial Priors for Object Placement
A diffusion-based pipeline creates a 27M-annotation dataset of object placements that outperforms human annotations and baselines on image editing tasks, then distills it into a fast model.
-
Towards Near-Real-Time Telemetry-Aware Routing with Neural Routing Algorithms
LOGGIA is a delay-aware graph neural routing algorithm using pre-training and RL that outperforms shortest-path and other neural methods in realistic network simulations.
-
Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini
XR Blocks supplies an LLM-optimized Reality Model and Vibe Coding XR workflow that converts high-level prompts into working physics-aware XR applications with high one-shot success.
-
Setting-Matched and Semantics-Scaled Benchmarking of One-Step Generative Models Against Multistep Diffusion and Flow Models
Matched benchmarking reveals FID misleads in few-step regimes under CFG, prompting CLIP-scaled and PickScore-scaled FID and IS variants for better semantic evaluation of one-step image generators.
-
Unifying Contrastive and Generative Objectives for Visual Understanding and Text-to-Image Generation
DREAM introduces Masking Warmup and Semantically Aligned Decoding to let a single encoder handle both contrastive alignment and masked generation, yielding gains over CLIP and FLUID on understanding and generation benchmarks.
-
MobileMold: A Smartphone-Based Microscopy Dataset for Food Mold Detection
MobileMold provides 4941 smartphone microscopy images and shows deep learning models reach 99.5% accuracy on mold detection and food classification tasks.
-
On the Convergence Rate of LoRA Gradient Descent
LoRA gradient descent converges to a stationary point at rate O(1/log T).
-
Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models
F2D2 jointly distills sampling and likelihood computation in flow-based models by adding a divergence head to a few-step flow map, achieving accurate log-likelihoods at 2-10 NFEs while preserving sample quality.
-
Representational Alignment Across Model Layers and Brain Regions with Multi-Level Optimal Transport
Multi-Level Optimal Transport (MOT) jointly infers soft layer couplings and neuron transport plans to produce global alignment scores and structured hierarchical correspondences between networks of varying depths.
-
Exemplar-Free Continual Learning for State Space Models
Inf-SSM constrains the infinite-horizon evolution of SSMs via Grassmannian geometry and an efficient O(n^2) Sylvester solver to enable exemplar-free continual learning with reduced forgetting.
-
CascadeFormer: Depth-Tapered Transformers Motivated by Gradient Fan-in Asymmetry
CascadeFormer tapers Transformer width with depth based on gradient fan-in asymmetry to match uniform baselines in perplexity while cutting latency 8.6%.
-
What's in an Earth Embedding? An Explainability Analysis of Location Encoders
Location embeddings from geographic INRs can be decomposed into sparse latent concepts, natural language concepts, and visual features while retaining high reconstruction capability.
-
Unmasking LAION-5B: Age, Gender, Race, and Emotion Biases in Large-Scale Image Datasets
Empirical audit of LAION-2B-en and LAION-2B-multi finds overrepresentation of young adults, White people, and males plus stereotypical emotion associations across two attribute classifiers.
-
DR-Mamba: Automatic Inference-Time Domain Adaptation for Document Image Binarization via Sample-Conditioned Detail-Background Suppression
DR-Mamba performs automatic per-document domain adaptation for binarization by modeling fast detail and slow background routes with subtractive suppression in a single forward pass.
-
WHET: Welding Homomorphic Encryption to Accelerator Architectures
WHET applies fine-grained coefficient-to-slot transforms, plaintext compression, and modulus raising plus lightweight hardware tweaks to FHE accelerators, delivering 1.38-8.74x per-area gains and sub-millisecond CKKS bootstrapping.
-
IDEAL: In-DEpth ALignment Makes A Discrete Representation AutoEncoder
IDEAL improves discrete representation autoencoders by jointly aligning quantized tokens with shallow and deep VFM features, reporting 0.61 rFID on ImageNet and 1.89 gFID for autoregressive image generation.
-
Few-step Generative Models as Lossy Compression
Few-step generative models can be reformulated as lossy codecs in the reverse channel coding framework without retraining, yielding faster encoding/decoding on low-resolution image benchmarks.
-
Selectivity Estimation for Semantic Filters on Image Data
Semantic Histograms treat semantic image filters as implicit range queries in embedding space and use two specificity estimators whose ensemble reduces end-to-end query optimization and execution overhead by up to 86%.
-
Parameter-efficient Dual-encoder Architecture with Differentiable Choquet Integral Fusion for Underwater Acoustic Classification
A parameter-efficient dual-encoder model with differentiable Choquet integral fusion improves underwater acoustic classification accuracy over single-encoder baselines on DeepShip and ShipsEar datasets.
-
What changes after deployment? A survey on On-device Learning in TinyML
A survey of on-device learning in TinyML organized by distribution change regimes, highlighting influences on applications, hardware, and solutions plus a gap between benchmarks and deployments.
-
Deep Psychovisual Image Representations
Proposes a psychovisual-inspired deep learning method that encodes images in learned frequency sub-bands for interpretable semantic structures and reduced depth dependence.
-
Growing a Neural Network in Breadth, Depth, and Time
Recurrent CNNs are trained with joint task and resource costs on breadth, depth, and time, yielding organic growth in all three dimensions that trades off for accuracy and matches human reaction times on object recognition.
-
Trajectory-Consistent Calibration for Cache-Accelerated Diffusion Models
TCC calibrates cached representations in diffusion sampling via an offline iterative procedure that accounts for trajectory shifts, improving FID from 29.83 to 27.35 on PixArt-alpha while preserving reuse policies.
-
TextTeacher: What Can Language Teach About Images?
TextTeacher uses frozen text embeddings from captions as semantic anchors to guide vision model training, improving ImageNet accuracy by up to 2.7 p.p. and transfer performance by 1.0 p.p. on average.
-
Activation-Free Backbones for Image Recognition: Polynomial Alternatives within MetaFormer-Style Vision Models
Polynomial replacements for activations in MLPs, convolutions, and attention within MetaFormer yield PolyNeXt models that match or exceed standard performance on ImageNet, ADE20K, and robustness benchmarks while beating prior polynomial networks.
-
Concise and Logically Consistent Conformal Sets for Neuro-Symbolic Concept-Based Models
COCOCO is a conformal framework for NeSy-CBMs that jointly conformalizes concepts and labels, reconciles them via deduction-abduction revision, and satisfies consistency, coverage, and conciseness while retaining distribution-free guarantees.
-
The Diffusion Encoder
A diffusion model serves as the encoder in an autoencoder when trained alternately with the decoder to resolve opposing update directions while retaining the standard diffusion training objective.
-
MC$^2$: Monte Carlo Correction for Fast Elliptic PDE Solving
MC² corrects low-budget Monte Carlo solutions for elliptic PDEs with a single-pass neural network to match the accuracy of 1000× more Monte Carlo samples while outperforming classical and learned baselines.
-
Removing the Watermark Is Not Enough: Forensic Stealth in Generative-AI Watermark Removal
Current AI image watermark removal attacks replace the watermark with a different forensic signal, allowing independent detectors to distinguish processed outputs from clean images at over 98% true-positive rate under a 1% false-positive budget.
-
Compared to What? Baselines and Metrics for Counterfactual Prompting
Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.
-
Preventing Latent Rehearsal Decay in Online Continual SSL with SOLAR
SOLAR prevents latent rehearsal decay in online continual SSL by adaptively managing replay buffers with deviation proxies and an explicit overlap loss, delivering both fast convergence and state-of-the-art final accuracy on vision benchmarks.
-
Zero-shot World Models Are Developmentally Efficient Learners
A zero-shot visual world model trained on one child's experience achieves broad competence on physical understanding benchmarks while matching developmental behavioral patterns.