TopoFisher optimizes trainable filtrations, vectorizations, and compressors in persistent homology to maximize Fisher information, yielding higher information than fixed cosmological summaries and approaching neural baselines with far fewer parameters while generalizing better under simulator shifts
Title resolution pending
18 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
α-TCAV replaces TCAV's hard indicator with a tunable smooth function to create a unified probabilistic framework with lower variance and guidance for parameter choice or Bayes-optimal scoring.
The paper establishes the first tilde O(epsilon^{-1}) upper bounds and matching lower bounds for forward-KL-regularized offline contextual bandits under single-policy concentrability in both tabular and general function approximation settings.
TimeTok is a unified framework using hierarchical tokenization for granularity-controllable time-series generation that achieves state-of-the-art performance in standard tasks and shows transferability across heterogeneous datasets.
ARCH is a hierarchical flow-based generative model that enables tractable conditional intensity computation and arbitrary conditioning for spatiotemporal event distributions.
Training per-layer affine probes on frozen transformers yields more reliable latent predictions than the logit lens and enables detection of malicious inputs from prediction trajectories.
HORST uses non-commutative operator composition and a hyperbolic mirror map to combine stability from adaptive optimizers with L1 sparsity bias, outperforming AdamW across sparsity levels on vision and language tasks.
PerturbedVAE disentangles perturbation-specific signals from invariant gene expression structure to recover causal representations and improve out-of-distribution prediction in single-cell perturbation modeling.
SNMPP builds a product-form neural influence kernel from a signed interaction network over event classes and a delay-aware monotonic temporal network to enable explicit discovery of inter-event relationships alongside strong prediction.
A framework using generative AI to produce synthetic multilevel data for Monte Carlo simulations that evaluate the performance and parameter recovery of quantitative methods.
DR-MOO adds distributional robustness to multi-objective optimization and gives single-loop MGDA algorithms reaching epsilon-Pareto-stationary points in O(epsilon^{-4}) samples for nonconvex problems.
SPHERE applies a Parseval penalty to MoE policies in continual RL to maintain spectral plasticity, yielding 133% and 50% higher average success on MetaWorld and HumanoidBench versus unregularized MoE baselines.
Common ID estimators fail to track the true intrinsic dimension of neural representations and are instead driven by other factors.
LeJEPA derives an optimal isotropic Gaussian target for embeddings and enforces it via sketched regularization to deliver scalable, heuristics-free self-supervised pretraining with 79% ImageNet linear accuracy on ViT-H/14.
Introduces a margin-adaptive confidence ranking method that learns an estimator from simulated diversity and derives margin-dependent generalization bounds for use in fixed-sequence testing of LLM-human agreement.
Online generalised predictive coding (ODEM) tracks latent states in nonlinear and chaotic generative models by separating temporal scales for fast Bayesian belief updating and slow parameter learning.
Discriminator-informed resampling via normalizing flows reduces error in the EnGMF for low-ensemble regimes on the Ikeda map and Lorenz '63 system.
citing papers explorer
-
Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability
The paper establishes the first tilde O(epsilon^{-1}) upper bounds and matching lower bounds for forward-KL-regularized offline contextual bandits under single-policy concentrability in both tabular and general function approximation settings.
-
Arbitrarily Conditioned Hierarchical Flows for Spatiotemporal Events
ARCH is a hierarchical flow-based generative model that enables tractable conditional intensity computation and arbitrary conditioning for spatiotemporal event distributions.
-
Eliciting Latent Predictions from Transformers with the Tuned Lens
Training per-layer affine probes on frozen transformers yields more reliable latent predictions than the logit lens and enables detection of malicious inputs from prediction trajectories.
-
HORST: Composing Optimizer Geometries for Sparse Transformer Training
HORST uses non-commutative operator composition and a hyperbolic mirror map to combine stability from adaptive optimizers with L1 sparsity bias, outperforming AdamW across sparsity levels on vision and language tasks.
-
What Makes a Representation Good for Single-Cell Perturbation Prediction?
PerturbedVAE disentangles perturbation-specific signals from invariant gene expression structure to recover causal representations and improve out-of-distribution prediction in single-cell perturbation modeling.
-
Structured Neural Marked Point Processes for Interpretable Event Interaction Modeling
SNMPP builds a product-form neural influence kernel from a signed interaction network over event classes and a delay-aware monotonic temporal network to enable explicit discovery of inter-event relationships alongside strong prediction.
-
Distributionally Robust Multi-Objective Optimization
DR-MOO adds distributional robustness to multi-objective optimization and gives single-loop MGDA algorithms reaching epsilon-Pareto-stationary points in O(epsilon^{-4}) samples for nonconvex problems.
-
SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning
SPHERE applies a Parseval penalty to MoE policies in continual RL to maintain spectral plasticity, yielding 133% and 50% higher average success on MetaWorld and HumanoidBench versus unregularized MoE baselines.
-
Rethinking Intrinsic Dimension Estimation in Neural Representations
Common ID estimators fail to track the true intrinsic dimension of neural representations and are instead driven by other factors.
-
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics
LeJEPA derives an optimal isotropic Gaussian target for embeddings and enforces it via sketched regularization to deliver scalable, heuristics-free self-supervised pretraining with 79% ImageNet linear accuracy on ViT-H/14.
-
Margin-Adaptive Confidence Ranking for Reliable LLM Judgement
Introduces a margin-adaptive confidence ranking method that learns an estimator from simulated diversity and derives margin-dependent generalization bounds for use in fixed-sequence testing of LLM-human agreement.
-
Learning Discriminators for Resampling in the Ensemble Gaussian Mixture Filter through a Normalizing Flow Approach
Discriminator-informed resampling via normalizing flows reduces error in the EnGMF for low-ensemble regimes on the Ikeda map and Lorenz '63 system.
- Query-efficient model evaluation using cached responses