Ensembits is the first tokenizer of protein conformational ensembles that outperforms static tokenizers on RMSF prediction and matches them on function and mutation tasks while using less pretraining data.
Title resolution pending
28 Pith papers cite this work, alongside 9,789 external citations. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
Proposes a scale-calibrated median-of-means estimator for robust aggregation of distributed PCA estimates on the product of Euclidean space and Grassmann manifold.
Bounded fitting can be extended to expressive description logics while retaining generalization guarantees and implemented practically via SAT solvers.
ProtoSSL discovers generalizable prototypes from unlabeled time-series via self-supervision and assigns them to new tasks for interpretable predictions, outperforming supervised baselines in low-data regimes on ECG datasets.
An intrinsic effective sample size for manifold MCMC is defined via kernel discrepancy as the number of independent draws yielding equivalent expected squared discrepancy to the target.
The profile maximum likelihood estimator for the location in anisotropic hyperbolic wrapped normal models is strongly consistent, asymptotically normal, and attains the Hájek-Le Cam minimax lower bound under squared geodesic loss.
Soft-MSM is a smooth, gradient-enabled version of the context-aware MSM distance for time series alignment that outperforms Soft-DTW alternatives in clustering and nearest-centroid classification.
Develops RBS and MSLS heuristics exploiting follower optimality properties for bilevel uniform parallel machine scheduling with up to 500 jobs.
Joint location-scale minimization for geometric medians on product manifolds degenerates to marginal medians, and three new scale-selection methods restore identifiability with asymptotic guarantees.
Grounded Correspondence maintains temporal consistency via deterministic bipartite matching on frozen backbone features instead of learned predictors, achieving competitive results on MOVi and YouTube-VIS with zero learnable temporal parameters.
PACMAB is a perception-aware two-sided learning framework for multi-platform mobile crowdsensing that models the setting as a dynamic hypergame and achieves at least 41% more completed tasks than benchmarks in simulations without assuming complete information.
Success bias in collective theory-building leads to systematic overestimation of theory quality, narrower search, and paradoxically lower performance when agents optimize for apparent success.
GNNs with Gumbel-Sinkhorn layers sample stoichiometry-compliant crystal structures unsupervisedly and outperform heuristics while matching commercial solvers.
WISE unifies representation via BEP, feature weighting via LOFO, two-stage clustering, and intrinsic explanations via DFI for mixed-type tabular data, outperforming baselines on six datasets.
Continual learning robots form a significantly more stable invariant subnetwork than constant-task controls, and preserving it improves adaptation while damaging it hurts performance.
A new compiler for surface codes on QCCD trapped-ion hardware shows that 2-ion traps outperform larger traps in logical clock speed and hardware efficiency, beating prior compilers by 3.8X on average.
Evolving hexacopter morphologies together with learnable controllers produces unconventional drones that outperform standard designs on complex tasks while introducing new metrics for evolution-learning interactions.
Introduces the XAMI benchmark dataset of 1000 annotated XMM-Newton images for artefact detection together with a hybrid CNN-transformer instance segmentation demonstration.
Regional fulfillment networks reduce order delays relative to global ones when assignments reach equilibrium under a greedy strategy.
Sticky factorial HDP-HMMs applied to multimodal valence-arousal trajectories identify interpretable persistent emotional regimes in conversations, outperforming Gaussian HMM baselines in consistency metrics and enabling context-augmented LLM responses.
Language model features form an early stable carrier scaffold of about 50 sparse features that is load-bearing, predictable from onset firing, and recruits most later features.
Lighthouse-Skel uses learned junction points and breakpoints to guide reconnection of discontinuous skeleton segments, improving structural integrity over standard point-level detection.
Characteristic roots govern dynamics in linear forecasting models but noise induces spurious roots; rank reduction and Root Purge regularization mitigate this for more robust predictions.
NOOUGAT unifies online and offline multi-object tracking with a GNN that processes non-overlapping subclips fused by an Autoregressive Long-term Tracking layer, reporting SOTA gains on DanceTrack, SportsMOT, and MOT20.
citing papers explorer
-
ENSEMBITS: an alphabet of protein conformational ensembles
Ensembits is the first tokenizer of protein conformational ensembles that outperforms static tokenizers on RMSF prediction and matches them on function and mutation tasks while using less pretraining data.
-
Scale-Calibrated Median-of-Means for Robust Distributed Principal Component Analysis
Proposes a scale-calibrated median-of-means estimator for robust aggregation of distributed PCA estimates on the product of Euclidean space and Grassmann manifold.
-
Bounded Fitting for Expressive Description Logics
Bounded fitting can be extended to expressive description logics while retaining generalization guarantees and implemented practically via SAT solvers.
-
ProtoSSL: Interpretable Prototype Learning from Unlabeled Time-Series Data
ProtoSSL discovers generalizable prototypes from unlabeled time-series via self-supervision and assigns them to new tasks for interpretable predictions, outperforming supervised baselines in low-data regimes on ECG datasets.
-
Intrinsic effective sample size for manifold-valued Markov chain Monte Carlo via kernel discrepancy
An intrinsic effective sample size for manifold MCMC is defined via kernel discrepancy as the number of independent draws yielding equivalent expected squared discrepancy to the target.
-
Profile Likelihood Inference for Anisotropic Hyperbolic Wrapped Normal Models on Hyperbolic Space
The profile maximum likelihood estimator for the location in anisotropic hyperbolic wrapped normal models is strongly consistent, asymptotically normal, and attains the Hájek-Le Cam minimax lower bound under squared geodesic loss.
-
Soft-MSM: Differentiable Context-Aware Elastic Alignment for Time Series
Soft-MSM is a smooth, gradient-enabled version of the context-aware MSM distance for time series alignment that outperforms Soft-DTW alternatives in clustering and nearest-centroid classification.
-
Heuristic approaches for solving a bilevel optimistic scheduling problem on parallel machines
Develops RBS and MSLS heuristics exploiting follower optimality properties for bilevel uniform parallel machine scheduling with up to 500 jobs.
-
Scale selection for geometric medians on product manifolds
Joint location-scale minimization for geometric medians on product manifolds degenerates to marginal medians, and three new scale-selection methods restore identifiability with asymptotic guarantees.
-
Rethinking Temporal Consistency in Video Object-Centric Learning: From Prediction to Correspondence
Grounded Correspondence maintains temporal consistency via deterministic bipartite matching on frozen backbone features instead of learned predictors, achieving competitive results on MOVi and YouTube-VIS with zero learnable temporal parameters.
-
Dynamic Hypergame for Task Assignment in Multi-platform Mobile Crowdsensing Under Incomplete Information
PACMAB is a perception-aware two-sided learning framework for multi-platform mobile crowdsensing that models the setting as a dynamic hypergame and achieves at least 41% more completed tasks than benchmarks in simulations without assuming complete information.
-
Nothing Deceives Like Success: Social Learning and the Illusion of Understanding in Science
Success bias in collective theory-building leads to systematic overestimation of theory quality, narrower search, and paradoxically lower performance when agents optimize for apparent success.
-
Crystal structure prediction using graph neural combinatorial optimization
GNNs with Gumbel-Sinkhorn layers sample stoichiometry-compliant crystal structures unsupervisedly and outperform heuristics while matching commercial solvers.
-
Weight-Informed Self-Explaining Clustering for Mixed-Type Tabular Data
WISE unifies representation via BEP, feature weighting via LOFO, two-stage clustering, and intrinsic explanations via DFI for mixed-type tabular data, outperforming baselines on six datasets.
-
Evidence of an Emergent "Self" in Continual Robot Learning
Continual learning robots form a significantly more stable invariant subnetwork than constant-task controls, and preserving it improves adaptation while damaging it hurts performance.
-
Architecting Scalable Trapped Ion Quantum Computers using Surface Codes
A new compiler for surface codes on QCCD trapped-ion hardware shows that 2-ion traps outperform larger traps in logical clock speed and hardware efficiency, beating prior compilers by 3.8X on average.
-
Unconventional Hexacopters via Evolution and Learning: Performance Gains and New Insights
Evolving hexacopter morphologies together with learnable controllers produces unconventional drones that outperform standard designs on complex tasks while introducing new metrics for evolution-learning interactions.
-
XAMI -- A Benchmark Dataset for Artefact Detection in XMM-Newton Optical Images
Introduces the XAMI benchmark dataset of 1000 annotated XMM-Newton images for artefact detection together with a hybrid CNN-transformer instance segmentation demonstration.
-
Improved Speed via Regional Fulfillment
Regional fulfillment networks reduce order delays relative to global ones when assignments reach equilibrium under a greedy strategy.
-
Multimodal Hidden Markov Models for Persistent Emotional State Tracking
Sticky factorial HDP-HMMs applied to multimodal valence-arousal trajectories identify interpretable persistent emotional regimes in conversations, outperforming Gaussian HMM baselines in consistency metrics and enabling context-augmented LLM responses.
-
Features have life history. And we should care
Language model features form an early stable carrier scaffold of about 50 sparse features that is load-bearing, predictable from onset firing, and recruits most later features.
-
Topology-Aware Skeleton Detection via Lighthouse-Guided Structured Inference
Lighthouse-Skel uses learned junction points and breakpoints to guide reconnection of discontinuous skeleton segments, improving structural integrity over standard point-level detection.
-
Characteristic Root Analysis and Regularization for Linear Time Series Forecasting
Characteristic roots govern dynamics in linear forecasting models but noise induces spurious roots; rank reduction and Root Purge regularization mitigate this for more robust predictions.
-
NOOUGAT: Towards Unified Online and Offline Multi-Object Tracking
NOOUGAT unifies online and offline multi-object tracking with a GNN that processes non-overlapping subclips fused by an Autoregressive Long-term Tracking layer, reporting SOTA gains on DanceTrack, SportsMOT, and MOT20.
-
Optimal sequential decision-making for error propagation mitigation in digital twins
Error propagation mitigation in digital twins is cast as an MDP/POMDP with HMM-derived regimes as states, where the MDP policy maximizes reward and the POMDP recovers 95% of that performance.
-
Mining Electronic Health Records to Investigate Effectiveness of Ensemble Deep Clustering
An ensemble deep clustering framework combined with traditional methods ranks highest across 14 clustering techniques on real EHR data for heart failure patients from the All of Us program.
-
A Tutorial Review of Bayesian Optimization with Gaussian Processes to Accelerate Stationary Point Searches
Bayesian optimization with Gaussian processes unifies minimization, single-point saddle searches, and double-ended path searches on potential energy surfaces through a shared six-step surrogate loop using derivative observations and inverse-distance kernels.
- Soft-Coherent Direct Multipath SLAM