hub

Chapman and Hall/CRC, New York (1994)

Bradley Efron, R.J. Tibshirani · 1994 · DOI 10.1201/9780429246593

10 Pith papers cite this work, alongside 5,583 external citations. Polarity classification is still indexing.

10 Pith papers citing it

5,583 external citations · Crossref

open at publisher browse 10 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 2 method 2

citation-polarity summary

background 2 use method 2

representative citing papers

EnergyAgentBench: Benchmarking LLM Agents on Live Energy Infrastructure Data

econ.EM · 2026-05-13 · accept · novelty 8.0

EnergyAgentBench is a new benchmark with 70 task variants that evaluates LLM agents on live energy data for datacenter siting, long-horizon optimization, and causal grid diagnosis.

Benchmarking Sensor-Fault Robustness in Forecasting

cs.LG · 2026-05-11 · conditional · novelty 7.0

SensorFault-Bench is a new CPS-grounded benchmark showing that clean-MSE rankings of forecasting models often disagree with their robustness under standardized sensor-fault scenarios across four real datasets.

Beyond Prediction Accuracy: Target-Space Recovery Profiles for Evaluating Model-Brain Alignment

q-bio.NC · 2026-05-19 · unverdicted · novelty 6.0

The authors propose target-space recovery profiles to diagnose which reproducible dimensions of fMRI brain responses are captured by model predictions, showing that accuracy alone can mask alignment mismatches in visual cortex.

Evaluating Multi-Hop Reasoning in RAG Systems: A Comparison of LLM-Based Retriever Evaluation Strategies

cs.IR · 2026-04-20 · unverdicted · novelty 6.0

CARE, a context-aware LLM judge, outperforms standard methods when evaluating multi-hop retrieval quality in RAG systems.

The Geometric Canary: Predicting Steerability and Detecting Drift via Representational Stability

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

Task-aligned supervised geometric stability predicts linear steerability with high accuracy while unsupervised stability detects representational drift earlier and with lower false alarms than CKA or Procrustes.

OSS: Open Suturing Skills Vision-Based Assessment Challenge 2024-2025

cs.CV · 2026-05-21 · accept · novelty 5.0

The OSS Challenge provides benchmarks showing spatiotemporal video models excel at open suturing skill classification and OSATS scoring but struggle with keypoint tracking under occlusion.

Beyond Seeing Is Believing: On Crowdsourced Detection of Audiovisual Deepfakes

cs.IR · 2026-05-06 · unverdicted · novelty 5.0

Crowdsourced judgments reliably flag authentic videos but frequently miss manipulations and struggle to identify whether changes are audio-only, video-only, or both.

Audio Video Verbal Analysis (AVVA) for Capturing Classroom Dialogues

physics.soc-ph · 2026-04-23 · unverdicted · novelty 5.0

AVVA is a new framework adapting verbal analysis for classroom discourse with triangulation across ten steps and a four-criterion validation scheme for temporal stability, applied to 23 hours of recordings.

Exploring climate change effects on concurrent floods and concurrent droughts via statistical deep learning

stat.AP · 2026-04-23 · unverdicted · novelty 5.0

The deep SPAR model shows concurrent floods and droughts becoming more likely in the Upper Danube by 2100 under high emissions, with changes in the dependence between catchments contributing substantially to the increase.

Fast and principled equation discovery from chaos to climate

cs.LG · 2026-04-13 · unverdicted · novelty 5.0

Bayesian-ARGOS is a hybrid frequentist-Bayesian method that discovers equations from limited noisy observations more efficiently than SINDy or bootstrap-ARGOS while adding uncertainty quantification.

citing papers explorer

Showing 10 of 10 citing papers.

EnergyAgentBench: Benchmarking LLM Agents on Live Energy Infrastructure Data econ.EM · 2026-05-13 · accept · none · ref 37
EnergyAgentBench is a new benchmark with 70 task variants that evaluates LLM agents on live energy data for datacenter siting, long-horizon optimization, and causal grid diagnosis.
Benchmarking Sensor-Fault Robustness in Forecasting cs.LG · 2026-05-11 · conditional · none · ref 28
SensorFault-Bench is a new CPS-grounded benchmark showing that clean-MSE rankings of forecasting models often disagree with their robustness under standardized sensor-fault scenarios across four real datasets.
Beyond Prediction Accuracy: Target-Space Recovery Profiles for Evaluating Model-Brain Alignment q-bio.NC · 2026-05-19 · unverdicted · none · ref 8
The authors propose target-space recovery profiles to diagnose which reproducible dimensions of fMRI brain responses are captured by model predictions, showing that accuracy alone can mask alignment mismatches in visual cortex.
Evaluating Multi-Hop Reasoning in RAG Systems: A Comparison of LLM-Based Retriever Evaluation Strategies cs.IR · 2026-04-20 · unverdicted · none · ref 10
CARE, a context-aware LLM judge, outperforms standard methods when evaluating multi-hop retrieval quality in RAG systems.
The Geometric Canary: Predicting Steerability and Detecting Drift via Representational Stability cs.LG · 2026-04-20 · unverdicted · none · ref 36
Task-aligned supervised geometric stability predicts linear steerability with high accuracy while unsupervised stability detects representational drift earlier and with lower false alarms than CKA or Procrustes.
OSS: Open Suturing Skills Vision-Based Assessment Challenge 2024-2025 cs.CV · 2026-05-21 · accept · none · ref 18
The OSS Challenge provides benchmarks showing spatiotemporal video models excel at open suturing skill classification and OSATS scoring but struggle with keypoint tracking under occlusion.
Beyond Seeing Is Believing: On Crowdsourced Detection of Audiovisual Deepfakes cs.IR · 2026-05-06 · unverdicted · none · ref 77
Crowdsourced judgments reliably flag authentic videos but frequently miss manipulations and struggle to identify whether changes are audio-only, video-only, or both.
Audio Video Verbal Analysis (AVVA) for Capturing Classroom Dialogues physics.soc-ph · 2026-04-23 · unverdicted · none · ref 53
AVVA is a new framework adapting verbal analysis for classroom discourse with triangulation across ten steps and a four-criterion validation scheme for temporal stability, applied to 23 hours of recordings.
Exploring climate change effects on concurrent floods and concurrent droughts via statistical deep learning stat.AP · 2026-04-23 · unverdicted · none · ref 19
The deep SPAR model shows concurrent floods and droughts becoming more likely in the Upper Danube by 2100 under high emissions, with changes in the dependence between catchments contributing substantially to the increase.
Fast and principled equation discovery from chaos to climate cs.LG · 2026-04-13 · unverdicted · none · ref 65
Bayesian-ARGOS is a hybrid frequentist-Bayesian method that discovers equations from limited noisy observations more efficiently than SINDy or bootstrap-ARGOS while adding uncertainty quantification.

Chapman and Hall/CRC, New York (1994)

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer