Hidden birth event information restores identifiability to time-dependent birth-death phylodynamic models; mutation-at-birth models make sequences sufficient to recover it.
hub Canonical reference
30 Leland McInnes, John Healy, and Steve Astels
Canonical reference. 71% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Proposes a scale-calibrated median-of-means estimator for robust aggregation of distributed PCA estimates on the product of Euclidean space and Grassmann manifold.
For linear-rate master equations the generating function admits an exact composition-multiplier representation whose Taylor coefficients on any finite window are obtained from a closed lower-triangular ODE of size 2(N+1), independent of the truncation cap N; the same closure is combined with Strang–
Zombie domain linkages persist after ownership changes in DNS integrations at rates of 3% in Web PKI, 24% in ENS, and 15% in Maven Central, with validate-once designs accumulating long-term risks while per-use validation prevents them.
The profile maximum likelihood estimator for the location in anisotropic hyperbolic wrapped normal models is strongly consistent, asymptotically normal, and attains the Hájek-Le Cam minimax lower bound under squared geodesic loss.
A large-scale study of real-world repositories finds that AI-generated code differs from human-written code in complexity, structural traits, defect indicators, and commit-level activity patterns.
Dynamical simulations show mm-sized meteoroids impacting Earth below 17 km/s are mostly asteroidal if released in the last 150-200 kyr, with cometary fraction rising above that speed and dominating above 27 km/s.
A LightGBM classifier trained on NWAY Bayesian matches identifies true Chandra-Gaia counterparts for 113k X-ray sources, flags 7k ambiguous cases, and attributes half of 20k separation-only matches to chance coincidences, validated at 95% on COUP without positional features.
Develops ACW-based semantic timescale features showing longer autocorrelation windows associate with generic vocabulary and shorter ones with specific words in both human and LLM speech, with the pattern abolished by randomizing word order and timing.
The nonparametric Kiefer-Weiss problem is solved by deriving an optimal stopping policy based on a two-dimensional statistic (likelihood ratio plus expected remaining sample size) whose randomization rule maps the likelihood ratio to an integer sample size.
Proposes an inferential framework to test differences in categorical Gini correlations for predictor importance in classification, establishing asymptotic normality and consistency while accommodating unequal dimensions and dependence.
Semantic segmentation decomposes monitoring features into canonical and residual components that concentrate fault-predictive information while preserving operational meaning in predictive maintenance.
A functional central limit theorem for pattern frequencies in 2D samples enables nonparametric goodness-of-fit, two-sample, and symmetry tests for copulas, with bootstrap critical values and parametric examples.
Empirical analysis of 4707 MoltBook posts shows AI-only technical discourse focuses on security, trust, and abstract topics while lacking concrete runtime and project details found in human GitHub discussions.
Joint location-scale minimization for geometric medians on product manifolds degenerates to marginal medians, and three new scale-selection methods restore identifiability with asymptotic guarantees.
RISE applies CountSketch to dual lexical and semantic channels derived from output-layer gradient outer products, cutting data attribution storage by up to 112x and enabling retrospective and prospective influence analysis on LLMs up to 32B parameters.
The SSTN detects non-normality by tracking how the standardized empirical characteristic function changes under repeated self-similarity transformations, with the null distribution calibrated by Monte Carlo simulation.
OSS4SG projects retain contributors at 2.2X higher rates with 19.6% higher core status probability than conventional OSS, and a late-spike temporal pattern enables faster core achievement (21 weeks) than early intensive contributions.
A survey of 419 practitioners shows strong reliance on reusable GitHub Actions for core CI/CD tasks but limited adoption of reusable workflows, with copy-pasting remaining common due to versioning and trust issues.
A parity-augmented ANOVA decomposition is established for functions on the sphere using orthogonal bases to capture geometry-induced variable dependencies.
CoCoMagic applies constrained cooperative co-evolution to metamorphic and differential testing to find up to 287% more distinct behavioral divergences in an end-to-end ADS than baseline search methods.
Large-scale review mining of 1M+ comments from 171 Gen-AI apps using an LLM framework reveals top topics plus three opportunities and three challenges for developers.
Hardware-software architecture for drone swarms illuminating line drawings mid-air, including Blender add-on, SVG import, and user study validating misalignment tolerance.
Ensemble voting strategies for change point detection improve F1-score by 11% over Mozilla's T-test method on a new ground-truth dataset of 174 performance time series annotated by practitioners.
citing papers explorer
-
Information on hidden birth events restores identifiability in phylodynamic inference
Hidden birth event information restores identifiability to time-dependent birth-death phylodynamic models; mutation-at-birth models make sequences sufficient to recover it.
-
Scale-Calibrated Median-of-Means for Robust Distributed Principal Component Analysis
Proposes a scale-calibrated median-of-means estimator for robust aggregation of distributed PCA estimates on the product of Euclidean space and Grassmann manifold.
-
Solving linear-rate ODE hierarchies (like master equations) using closures and operator splitting
For linear-rate master equations the generating function admits an exact composition-multiplier representation whose Taylor coefficients on any finite window are obtained from a closed lower-triangular ODE of size 2(N+1), independent of the truncation cap N; the same closure is combined with Strang–
-
Zombies in Alternate Realities: The Afterlife of Domain Names in DNS Integrations
Zombie domain linkages persist after ownership changes in DNS integrations at rates of 3% in Web PKI, 24% in ENS, and 15% in Maven Central, with validate-once designs accumulating long-term risks while per-use validation prevents them.
-
Profile Likelihood Inference for Anisotropic Hyperbolic Wrapped Normal Models on Hyperbolic Space
The profile maximum likelihood estimator for the location in anisotropic hyperbolic wrapped normal models is strongly consistent, asymptotically normal, and attains the Hájek-Le Cam minimax lower bound under squared geodesic loss.
-
A Large-Scale Empirical Study of AI-Generated Code in Real-World Repositories
A large-scale study of real-world repositories finds that AI-generated code differs from human-written code in complexity, structural traits, defect indicators, and commit-level activity patterns.
-
The Dynamical Origin of Millimetre-Sized Sporadic Meteoroids
Dynamical simulations show mm-sized meteoroids impacting Earth below 17 km/s are mostly asteroidal if released in the last 150-200 kyr, with cometary fraction rising above that speed and dominating above 27 km/s.
-
The Chandra-Gaia Catalog of Counterparts: Resolving ambiguous Gaia matches to X-ray sources in the Chandra Source Catalog using Machine Learning
A LightGBM classifier trained on NWAY Bayesian matches identifies true Chandra-Gaia counterparts for 113k X-ray sources, flags 7k ambiguous cases, and attributes half of 20k separation-only matches to chance coincidences, validated at 95% on COUP without positional features.
-
The Dynamics of Human and AI-Generated Language: How Semantics Fluctuates across Different Timescales
Develops ACW-based semantic timescale features showing longer autocorrelation windows associate with generic vocabulary and shorter ones with specific words in both human and LLM speech, with the pattern abolished by randomizing word order and timing.
-
The Nonparametric Kiefer-Weiss Problem
The nonparametric Kiefer-Weiss problem is solved by deriving an optimal stopping policy based on a two-dimensional statistic (likelihood ratio plus expected remaining sample size) whose randomization rule maps the likelihood ratio to an integer sample size.
-
Comparing Two Categorical Gini Correlations with Applications to Classification Problems
Proposes an inferential framework to test differences in categorical Gini correlations for predictor importance in classification, establishing asymptotic normality and consistency while accommodating unequal dimensions and dependence.
-
Semantic Feature Segmentation for Interpretable Predictive Maintenance in Complex Systems
Semantic segmentation decomposes monitoring features into canonical and residual components that concentrate fault-predictive information while preserving operational meaning in predictive maintenance.
-
Pattern-based tests for two-dimensional copulas
A functional central limit theorem for pattern frequencies in 2D samples enables nonparametric goodness-of-fit, two-sample, and symmetry tests for copulas, with bootstrap critical values and parametric examples.
-
What Software Engineering Looks Like to AI Agents? -- An Empirical Study of AI-Only Technical Discourse on MoltBook
Empirical analysis of 4707 MoltBook posts shows AI-only technical discourse focuses on security, trust, and abstract topics while lacking concrete runtime and project details found in human GitHub discussions.
-
Scale selection for geometric medians on product manifolds
Joint location-scale minimization for geometric medians on product manifolds degenerates to marginal medians, and three new scale-selection methods restore identifiability with asymptotic guarantees.
-
Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation
RISE applies CountSketch to dual lexical and semantic channels derived from output-layer gradient outer products, cutting data attribution storage by up to 112x and enabling retrospective and prospective influence analysis on LLMs up to 32B parameters.
-
A test for normality based on self-similarity
The SSTN detects non-normality by tracking how the standardized empirical characteristic function changes under repeated self-similarity transformations, with the null distribution calibrated by Monte Carlo simulation.
-
Do Good, Stay Longer? Temporal Patterns and Predictors of Newcomer-to-Core Transitions in Conventional OSS and OSS4SG
OSS4SG projects retain contributors at 2.2X higher rates with 19.6% higher core status probability than conventional OSS, and a late-spike temporal pattern enables faster core achievement (21 weeks) than early intensive contributions.
-
Automation and Reuse Practices in GitHub Actions Workflows: A Practitioner's Perspective
A survey of 419 practitioners shows strong reliance on reusable GitHub Actions for core CI/CD tasks but limited adoption of reusable workflows, with copy-pasting remaining common due to versioning and trust issues.
-
Sensitivity Analysis on the Sphere and a Spherical ANOVA Decomposition
A parity-augmented ANOVA decomposition is established for functions on the sphere using orthogonal bases to capture geometry-induced variable dependencies.
-
Constrained Co-evolutionary Metamorphic Differential Testing for Autonomous Systems with an Interpretability Approach
CoCoMagic applies constrained cooperative co-evolution to metamorphic and differential testing to find up to 287% more distinct behavioral divergences in an end-to-end ADS than baseline search methods.
-
Understanding the Challenges and Opportunities of Generative AI Apps: An Empirical Study
Large-scale review mining of 1M+ comments from 171 Gen-AI apps using an LLM framework reveals top topics plus three opportunities and three challenges for developers.
-
Line Drawings using LightBenders: Authoring and Illuminating
Hardware-software architecture for drone swarms illuminating line drawings mid-air, including Blender add-on, SVG import, and user study validating misalignment tolerance.
-
Exploring Statistical Change Point Detection Techniques for Performance Anomaly Detection at Mozilla
Ensemble voting strategies for change point detection improve F1-score by 11% over Mozilla's T-test method on a new ground-truth dataset of 174 performance time series annotated by practitioners.
-
Deep Slice Interpolation for Reducing Through-Plane Anisotropy and Noise in Head CT
Deep learning system synthesizes intermediate head CT slices to halve through-plane anisotropy while providing implicit denoising, outperforming baselines on structural metrics.
-
DXA-Derived Skeletal Phenotypes and Hip Fracture Risk: A Backdoor-Adjusted Causal Analysis
Backdoor-adjusted ATEs on 21,098 UK Biobank participants showed total femur BMC and BMD with the largest hip fracture risk reductions (-0.0047 per SD), and adding the top 11 phenotypes to clinical variables raised AUC to 0.842 versus FRAX 0.709.
-
Social Policy of Large Language Models: How GPT, Claude, DeepSeek and Grok Allocate Social Budgets in Spain and Germany
Four LLMs exhibit a shared implicit social policy that under-allocates pensions by a factor of three and over-allocates housing by four compared to OECD budgets, with only Claude showing meaningful response to national context.
-
Reduced-Precision Stochastic Simulation for Mathematical Biology
Mixed-precision SSA with stochastic rounding preserves ensemble statistics across five biological models while cutting memory use by 2-4x and delivering up to 1.5x CPU speedup.
-
Buoyancy-dependent induced flow by vertically migrating swimmers
Induced flow velocity from vertically migrating Artemia salina swarms scales with the product of swimmer number and buoyancy-driven density difference.
-
Exploring the Grassroots Understanding and Practices of Collective Memory Co-Contribution in a University Community
University community members split between reflecting on past events or recording today's experiences as future history when contributing to collective memory, yielding design considerations for community platforms.
-
Optimal control of the future via prospective learning with control
Prospective Learning with Control proves ERM asymptotically achieves the Bayes optimal policy in non-stationary reset-free settings and outperforms time-aware RL on a 1D foraging benchmark.
-
Network Inequality through Preferential Attachment, Triadic Closure, and Homophily
PATCH model simulations show preferential attachment and homophily increase segregation and degree inequality while triadic closure reduces segregation but amplifies overall inequality, and the model accounts for observed gender disparities in 50 years of physics and CS collaboration networks.
-
Cumulative Advantage of Brokerage in Academia
Early brokerage in academic networks produces cumulative advantage in later participation and career impact for physicists, equally for men and women.
-
sumoITScontrol: Traffic Controller Collection for SUMO Traffic Simulations
sumoITScontrol provides a collection of traffic controllers for SUMO simulations and stresses the importance of variance-aware evaluation methods for reproducible research.
-
Time-dependent structural equation modeling of fans' football fever using activity tracking data during the 2025 DFB Cup final
Football fever in spectators follows a V-shaped time course captured as a latent process from heart rate and stress data via time-dependent structural equation modeling.
-
Empirical Comparison of Agent Communication Protocols for Task Orchestration
This work provides an empirical comparison of tool integration, multi-agent delegation, and hybrid architectures for LLM task orchestration, measuring response time, context consumption, cost, error recovery, and implementation complexity.
-
Community-Based Early-Stage Chronic Kidney Disease Screening using Explainable Machine Learning for Low-Resource Settings
Machine learning models trained on Bangladeshi community data achieve 89-90% balanced accuracy for early CKD detection using few accessible features, outperforming traditional screening tools and generalizing across external datasets from India, UAE, and Bangladesh.
-
Bayesian inference for compact binary coalescences with BILBY: Validation and application to the first LIGO--Virgo gravitational-wave transient catalogue
BILBY is validated on simulated compact binary signals and reproduces the eleven GWTC-1 results with configuration and output files provided for reproduction.
- Multi-Task Optimization over Networks of Tasks