Tool reference

Estimating the Dimension of a Model

Gideon Schwarz · 1978 · arXiv aos/1176344

Tool reference. 83% of classified Pith citations use this work as a method, library, or software dependency, not as a substantive claim.

33 Pith papers citing it

Method reference 83% of classified citations

read on arXiv browse 33 citing papers

citation-role summary

method 5 background 1

citation-polarity summary

use method 5 background 1

representative citing papers

Evaluating Deep Research Agents on Expert Consulting Work: A Benchmark with Verifiers, Rubrics, and Cognitive Traps

cs.AI · 2026-05-17 · unverdicted · novelty 7.0

New benchmark evaluates three frontier deep research agents on 42 SME prompts with verifiers and rubrics, reporting low acceptance rates of 9.5-21.4% and agent-specific failure modes.

ProactBench: Beyond What The User Asked For

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

ProactBench measures LLM conversational proactivity in three phases using 198 multi-agent dialogues and finds recovery behavior hard to predict from existing benchmarks.

The finite-shot help-harm boundary of zero-noise extrapolation

quant-ph · 2026-05-07 · unverdicted · novelty 7.0

Zero-noise extrapolation has a finite-shot help-harm boundary below which it increases local mean-squared error due to variance penalties outweighing bias reduction.

JudgeSense: A Benchmark for Prompt Sensitivity in LLM-as-a-Judge Systems

cs.CL · 2026-04-26 · unverdicted · novelty 7.0

JudgeSense benchmark shows LLM judge consistency does not reliably improve with model scale, with coherence most sensitive to prompt changes and factuality more stable.

How to quantify direct correlations between variables

stat.ME · 2026-04-20 · unverdicted · novelty 7.0

Jensen-Shannon regularized analogues of KL-based direct-correlation measures are introduced, taking values in [0,1] and accompanied by alphabet-size-dependent upper bounds under the observed marginal p(x,z).

Causal Process Models: Reframing Dynamic Causal Graph Discovery as a Reinforcement Learning Problem

cs.LG · 2025-07-18 · unverdicted · novelty 7.0

Causal Process Models reframe dynamic causal graph discovery as multi-agent reinforcement learning to build sparse time-varying graphs only at active interactions, outperforming dense baselines on physical prediction.

Variational Sequential Optimal Experimental Design using Reinforcement Learning

stat.ML · 2023-06-17 · unverdicted · novelty 7.0

vsOED uses a variational one-point reward and RL policy optimization to provide a lower bound on expected information gain for sequential experimental design, supporting nuisance parameters, implicit likelihoods, and multiple design goals.

AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration

cs.AI · 2026-05-19 · unverdicted · novelty 6.0

AutoResearchClaw presents a multi-agent autonomous research pipeline with debate, self-healing execution, verifiable reporting, human-in-the-loop modes, and cross-run evolution that outperforms AI Scientist v2 by 54.7% on the ARC-Bench benchmark.

Reliable model selection in the presence of parameter non-identifiability

stat.ME · 2026-05-19 · unverdicted · novelty 6.0

Proposes adaptive multiple importance sampling for robust Bayesian model evidence estimation under parameter non-identifiability, shown to outperform deterministic methods on ecological case studies while being cheaper than MCMC.

Bayesian Modeling and Prediction of Generalized Contact Matrices

stat.ME · 2026-05-07 · unverdicted · novelty 6.0

A Bayesian model for multi-feature contact matrices that uses tensor structures and contingency table theory to satisfy structural constraints and impute missing contact features, validated on simulations and US/German survey data.

Sequential Bayesian Monitoring for Recoverable and Drifting Processes

stat.CO · 2026-05-05 · unverdicted · novelty 6.0

Bayesian procedures are derived to compute the posterior probability that a recoverable process is currently in control or that a drifting latent parameter lies in an acceptable region.

A Semi-Supervised Kernel Two-Sample Test

stat.ML · 2026-05-03 · unverdicted · novelty 6.0

A semi-supervised kernel two-sample test integrates unlabeled covariate data to achieve asymptotic normality under the null, higher power than standard kernel tests, and consistency against fixed and local alternatives.

The Geometric Canary: Predicting Steerability and Detecting Drift via Representational Stability

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

Task-aligned supervised geometric stability predicts linear steerability with high accuracy while unsupervised stability detects representational drift earlier and with lower false alarms than CKA or Procrustes.

The Autocorrelation Blind Spot: Why 42% of Turn-Level Findings in LLM Conversation Analysis May Be Spurious

cs.CL · 2026-04-15 · accept · novelty 6.0

42% of significant turn-level associations in LLM conversation analysis are spurious due to unaccounted autocorrelation, with a validated two-stage correction framework improving replication.

Cell-induced densification and tether formation in fibrous extracellular matrices with biomimetic physics-informed neural networks

cs.LG · 2026-03-31 · unverdicted · novelty 6.0

Bio-PINNs with a near-to-far curriculum and deformation-uncertainty proxy recover cell-induced densified phases and tether morphologies more reliably than standard adaptive PINN baselines in single-cell and multicellular settings.

A decay-adjusted spatio-temporal model to account for the impact of mass drug administration on neglected tropical disease prevalence

stat.AP · 2025-12-03 · unverdicted · novelty 6.0

A new decay-adjusted spatio-temporal model improves estimation of neglected tropical disease prevalence by explicitly accounting for the waning impact of mass drug administration in sparse survey data.

Nested Sampling for ARIMA Model Selection in Astronomical Time-Series Analysis

astro-ph.IM · 2025-12-01 · unverdicted · novelty 6.0

Nested sampling applied to ARIMA models enables Bayesian order selection and parameter inference that recovers ground truth in simulations and fits stochastic variability in sunspot, Kepler, and TESS light curves.

INLA-RF: A Hybrid Modeling Strategy for Spatio-Temporal Environmental Data

stat.ME · 2025-07-24 · unverdicted · novelty 6.0

A hybrid INLA-RF framework integrates Bayesian spatio-temporal modeling with random forests through two iterative algorithms to improve predictions and uncertainty quantification for environmental data.

How Many Human Survey Respondents is a Large Language Model Worth? An Uncertainty Quantification Perspective

stat.ME · 2025-02-25 · unverdicted · novelty 6.0

A data-driven method adaptively selects the number of LLM-simulated responses to form confidence sets with nominal coverage for human survey parameters and equates that number to the LLM's effective human-equivalent sample size.

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

cs.CL · 2024-01-31 · unverdicted · novelty 6.0

RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.

Stability selection enables robust learning of partial differential equations from limited noisy data

math.NA · 2019-07-17 · unverdicted · novelty 6.0

PDE-STRIDE applies stability-based model selection to sparse regression for robust, parameter-free recovery of PDEs from noisy data.

dynesty: A Dynamic Nested Sampling Package for Estimating Bayesian Posteriors and Evidences

astro-ph.IM · 2019-04-03 · accept · novelty 6.0

dynesty is an open-source Python package for dynamic nested sampling that improves efficiency in Bayesian posterior and evidence estimation compared to MCMC on certain problems.

A tool to determine the degrees of freedom in tree-structured varying coefficient models

stat.ME · 2026-05-18 · unverdicted · novelty 5.0

A formula approximating degrees of freedom for tree-structured varying coefficient models is proposed to improve BIC model selection over naive parameter counting.

Unsupervised Domain Shift Detection with Interpretable Subspace Attribution

stat.ML · 2026-05-15 · unverdicted · novelty 5.0

An unsupervised method detects domain shifts via localized density anomaly search in feature space, attributes the shift to a minimal subspace, and extracts balanced subsets from two unlabeled datasets.

citing papers explorer

Showing 33 of 33 citing papers.

Evaluating Deep Research Agents on Expert Consulting Work: A Benchmark with Verifiers, Rubrics, and Cognitive Traps cs.AI · 2026-05-17 · unverdicted · none · ref 6
New benchmark evaluates three frontier deep research agents on 42 SME prompts with verifiers and rubrics, reporting low acceptance rates of 9.5-21.4% and agent-specific failure modes.
ProactBench: Beyond What The User Asked For cs.LG · 2026-05-09 · unverdicted · none · ref 108
ProactBench measures LLM conversational proactivity in three phases using 198 multi-agent dialogues and finds recovery behavior hard to predict from existing benchmarks.
The finite-shot help-harm boundary of zero-noise extrapolation quant-ph · 2026-05-07 · unverdicted · none · ref 38
Zero-noise extrapolation has a finite-shot help-harm boundary below which it increases local mean-squared error due to variance penalties outweighing bias reduction.
JudgeSense: A Benchmark for Prompt Sensitivity in LLM-as-a-Judge Systems cs.CL · 2026-04-26 · unverdicted · none · ref 1
JudgeSense benchmark shows LLM judge consistency does not reliably improve with model scale, with coherence most sensitive to prompt changes and factuality more stable.
How to quantify direct correlations between variables stat.ME · 2026-04-20 · unverdicted · none · ref 42
Jensen-Shannon regularized analogues of KL-based direct-correlation measures are introduced, taking values in [0,1] and accompanied by alphabet-size-dependent upper bounds under the observed marginal p(x,z).
Causal Process Models: Reframing Dynamic Causal Graph Discovery as a Reinforcement Learning Problem cs.LG · 2025-07-18 · unverdicted · none · ref 17
Causal Process Models reframe dynamic causal graph discovery as multi-agent reinforcement learning to build sparse time-varying graphs only at active interactions, outperforming dense baselines on physical prediction.
Variational Sequential Optimal Experimental Design using Reinforcement Learning stat.ML · 2023-06-17 · unverdicted · none · ref 49
vsOED uses a variational one-point reward and RL policy optimization to provide a lower bound on expected information gain for sequential experimental design, supporting nuisance parameters, implicit likelihoods, and multiple design goals.
AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration cs.AI · 2026-05-19 · unverdicted · none · ref 6
AutoResearchClaw presents a multi-agent autonomous research pipeline with debate, self-healing execution, verifiable reporting, human-in-the-loop modes, and cross-run evolution that outperforms AI Scientist v2 by 54.7% on the ARC-Bench benchmark.
Reliable model selection in the presence of parameter non-identifiability stat.ME · 2026-05-19 · unverdicted · none · ref 145
Proposes adaptive multiple importance sampling for robust Bayesian model evidence estimation under parameter non-identifiability, shown to outperform deterministic methods on ecological case studies while being cheaper than MCMC.
Bayesian Modeling and Prediction of Generalized Contact Matrices stat.ME · 2026-05-07 · unverdicted · none · ref 83
A Bayesian model for multi-feature contact matrices that uses tensor structures and contingency table theory to satisfy structural constraints and impute missing contact features, validated on simulations and US/German survey data.
Sequential Bayesian Monitoring for Recoverable and Drifting Processes stat.CO · 2026-05-05 · unverdicted · none · ref 45
Bayesian procedures are derived to compute the posterior probability that a recoverable process is currently in control or that a drifting latent parameter lies in an acceptable region.
A Semi-Supervised Kernel Two-Sample Test stat.ML · 2026-05-03 · unverdicted · none · ref 63
A semi-supervised kernel two-sample test integrates unlabeled covariate data to achieve asymptotic normality under the null, higher power than standard kernel tests, and consistency against fixed and local alternatives.
The Geometric Canary: Predicting Steerability and Detecting Drift via Representational Stability cs.LG · 2026-04-20 · unverdicted · none · ref 116
Task-aligned supervised geometric stability predicts linear steerability with high accuracy while unsupervised stability detects representational drift earlier and with lower false alarms than CKA or Procrustes.
The Autocorrelation Blind Spot: Why 42% of Turn-Level Findings in LLM Conversation Analysis May Be Spurious cs.CL · 2026-04-15 · accept · none · ref 10
42% of significant turn-level associations in LLM conversation analysis are spurious due to unaccounted autocorrelation, with a validated two-stage correction framework improving replication.
Cell-induced densification and tether formation in fibrous extracellular matrices with biomimetic physics-informed neural networks cs.LG · 2026-03-31 · unverdicted · none · ref 37
Bio-PINNs with a near-to-far curriculum and deformation-uncertainty proxy recover cell-induced densified phases and tether morphologies more reliably than standard adaptive PINN baselines in single-cell and multicellular settings.
A decay-adjusted spatio-temporal model to account for the impact of mass drug administration on neglected tropical disease prevalence stat.AP · 2025-12-03 · unverdicted · none · ref 5
A new decay-adjusted spatio-temporal model improves estimation of neglected tropical disease prevalence by explicitly accounting for the waning impact of mass drug administration in sparse survey data.
Nested Sampling for ARIMA Model Selection in Astronomical Time-Series Analysis astro-ph.IM · 2025-12-01 · unverdicted · none · ref 44
Nested sampling applied to ARIMA models enables Bayesian order selection and parameter inference that recovers ground truth in simulations and fits stochastic variability in sunspot, Kepler, and TESS light curves.
INLA-RF: A Hybrid Modeling Strategy for Spatio-Temporal Environmental Data stat.ME · 2025-07-24 · unverdicted · none · ref 28
A hybrid INLA-RF framework integrates Bayesian spatio-temporal modeling with random forests through two iterative algorithms to improve predictions and uncertainty quantification for environmental data.
How Many Human Survey Respondents is a Large Language Model Worth? An Uncertainty Quantification Perspective stat.ME · 2025-02-25 · unverdicted · none · ref 22
A data-driven method adaptively selects the number of LLM-simulated responses to form confidence sets with nominal coverage for human survey parameters and equates that number to the LLM's effective human-equivalent sample size.
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval cs.CL · 2024-01-31 · unverdicted · none · ref 53
RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.
Stability selection enables robust learning of partial differential equations from limited noisy data math.NA · 2019-07-17 · unverdicted · none · ref 57
PDE-STRIDE applies stability-based model selection to sparse regression for robust, parameter-free recovery of PDEs from noisy data.
dynesty: A Dynamic Nested Sampling Package for Estimating Bayesian Posteriors and Evidences astro-ph.IM · 2019-04-03 · accept · none · ref 19
dynesty is an open-source Python package for dynamic nested sampling that improves efficiency in Bayesian posterior and evidence estimation compared to MCMC on certain problems.
A tool to determine the degrees of freedom in tree-structured varying coefficient models stat.ME · 2026-05-18 · unverdicted · none · ref 95
A formula approximating degrees of freedom for tree-structured varying coefficient models is proposed to improve BIC model selection over naive parameter counting.
Unsupervised Domain Shift Detection with Interpretable Subspace Attribution stat.ML · 2026-05-15 · unverdicted · none · ref 10
An unsupervised method detects domain shifts via localized density anomaly search in feature space, attributes the shift to a minimal subspace, and extracts balanced subsets from two unlabeled datasets.
Fast and principled equation discovery from chaos to climate cs.LG · 2026-04-13 · unverdicted · none · ref 27
Bayesian-ARGOS is a hybrid frequentist-Bayesian method that discovers equations from limited noisy observations more efficiently than SINDy or bootstrap-ARGOS while adding uncertainty quantification.
Multi-fidelity Gaussian process regression for noisy outputs and non-nested experimental designs: a comparison between the recursive and non-recursive formulations stat.AP · 2025-11-25 · unverdicted · none · ref 9
Recursive multi-fidelity GP regression with EM optimization trains faster than the coupled non-recursive Kennedy-O'Hagan approach on noisy non-nested data while delivering comparable predictions and uncertainty estimates.
QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models cs.CL · 2025-09-22 · unverdicted · none · ref 22
QWHA proposes Walsh-Hadamard Transform adapters with adaptive initialization for quantization-aware PEFT, claiming better low-bit accuracy and faster training than low-rank or other FT-based baselines.
Understanding Task Representations in Neural Networks via Bayesian Ablation cs.LG · 2025-05-19 · unverdicted · none · ref 4
A Bayesian ablation framework combined with information-theoretic metrics is introduced to analyze causal roles, distributedness, manifold complexity, and polysemanticity of task representations in neural networks.
Power Studies For Two-Sample and Goodness-of-Fit Methods For Multivariate Data stat.ME · 2026-05-12 · unverdicted · none · ref 30
No single goodness-of-fit or two-sample test reliably detects deviations across all multivariate scenarios, so the authors recommend a small combination of methods that together cover the simulated cases.
Spatially Resolved Kinematics of SLACS Lens Galaxies. II: Breaking Degeneracies with Lensing and Dynamical Models astro-ph.GA · 2026-04-14 · unverdicted · none · ref 57
Spatially resolved kinematics show SLACS lens galaxies have nearly isothermal total mass profiles (mean γ=2.04) with average mass-sheet parameter λ_int=1.01, consistent with no measurable bias from power-law assumptions in cosmography.
Latent Profiles of AI Risk Perception and Their Differential Association with Community Driving Safety Concerns: A Person-Centered Analysis cs.CY · 2026-04-06 · unverdicted · none · ref 44
Four latent profiles of AI risk perception were identified in U.S. adults, with higher AI concern generally linked to greater perceived driving-hazard severity except for AI-versus-human driving comparisons.
Environment-Aware Indoor LoRaWAN Path Loss: Parametric Regression Comparisons, Shadow Fading, and Calibrated Fade Margins cs.NI · 2025-10-05 · conditional · none · ref 48 · 2 links
Environment-conditioned parametric regression on 12-month indoor LoRaWAN data reduces cross-validated RMSE from 8.23 dB to 7.38 dB and lowers the fade margin needed for 99% reliability from ~28 dB to 25.73 dB.
Constraining Dark Energy Dynamics in Curved Spacetime with Current Observations physics.gen-ph · 2026-05-08 · unverdicted · none · ref 18
Observational constraints on a dark energy EoS parametrization in curved spacetime yield α ≈ 0.35 (0.56) and Ω_k0 that changes sign with ANN data reconstruction.

Estimating the Dimension of a Model

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer