When Individually Calibrated Models Become Collectively Miscalibrated

Zhaohui Wang

arxiv: 2605.18858 · v1 · pith:MV2LDSYZnew · submitted 2026-05-14 · 💻 cs.LG · cs.AI· cs.GT· stat.ML

When Individually Calibrated Models Become Collectively Miscalibrated

Zhaohui Wang This is my paper

Pith reviewed 2026-05-20 20:17 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.GTstat.ML

keywords calibrationBrier scorePrice of Anarchymulti-agentaggregationprobabilistic forecastingmachine learning

0 comments

The pith

Individually calibrated models become collectively miscalibrated under Brier-score aggregation with correlated beliefs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that when multiple agents report probability estimates to minimize their individual Brier scores and their beliefs are positively correlated due to shared data, the reports systematically underestimate the positive-class probability. This causes the aggregated prediction to be miscalibrated even if each individual report is calibrated. A reader would care because this challenges the common practice of assuming individual calibration ensures good aggregate performance in systems that combine multiple models. The analysis includes a bound on the resulting Price of Anarchy and demonstrates that VCG aggregation avoids the problem by aligning incentives.

Core claim

Under Brier-score-based aggregation with positively correlated beliefs, each agent's individually optimal report systematically underestimates the positive-class probability, yielding a Price of Anarchy greater than one whenever Cov(b_i, b_j) > 0. In the canonical setting with n=5, pairwise correlation=0.5, base rate=0.3, the empirically measured PoA in false-negative rate reaches 7.25x. VCG-based aggregation aligns incentives and achieves dominant-strategy incentive compatibility.

What carries the argument

The game-theoretic strategic response to Brier-score aggregation, where agents optimize local scores without coordination, leading to underestimation when beliefs covary positively.

If this is right

Each agent's report underestimates the positive-class probability under positive covariance.
The aggregate shows higher false-negative rates, up to 7.25 times in the example case.
VCG aggregation provides incentive compatibility and maintains accuracy on real datasets.
Adaptive weighting improves performance under distribution shift.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar miscalibration could occur in other aggregation rules if they do not account for strategic reporting.
Monitoring correlations between model predictions could help detect potential collective miscalibration.
Extending this to non-probabilistic settings or different loss functions might reveal analogous incentive issues.

Load-bearing premise

Agents independently optimize their local Brier score reports without coordination and treat the aggregation rule as fixed when choosing their reports.

What would settle it

Comparing the frequency of positive outcomes to the aggregated probability estimate when agents use Brier-optimal reports versus when they report truthfully, in a controlled setting with known positive correlations.

Figures

Figures reproduced from arXiv: 2605.18858 by Zhaohui Wang.

**Figure 1.** Figure 1: Overview. (a) Individually calibrated agents become collectively miscalibrated under strategic interaction (aggregate bias ¯δ= − 0.375). (b) Brier scoring incurs 7.25× PoA in falsenegative rate; VCG achieves the lowest PoA among mechanisms studied. (c) VCG outperforms stacking and majority vote at n≤500 (9.4% fewer FNs at n=100). (d) Pipeline: feature-partitioned agents report probabilities; VCG computes … view at source ↗

**Figure 2.** Figure 2: Mixed n=8 ensemble (4 sklearn + 4 LLM prompts) on NSL-KDD. (a) VCG-cal aggregator [PITH_FULL_IMAGE:figures/full_fig_p019_2.png] view at source ↗

**Figure 3.** Figure 3: Disagreement vs. FN rate. Higher disagreement does not consistently reduce FN. [PITH_FULL_IMAGE:figures/full_fig_p025_3.png] view at source ↗

**Figure 4.** Figure 4: VCG vs. Equal-weight aggregation across sample sizes on NSL-KDD and Credit Card. [PITH_FULL_IMAGE:figures/full_fig_p025_4.png] view at source ↗

**Figure 5.** Figure 5: k-LOO approximation tradeoff: relative FN error (%, left axis) and aggregation latency (ms, right axis) vs. number of LOO evaluations k, for n ∈ {5, 10, 20} agents. Discussion. The k-LOO approximation is highly accurate: even k = 1 yields FN error < 0.005 across all n ( [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗

**Figure 6.** Figure 6: FN rate heatmap under adversarial corruption. VCG (bottom row in each panel) maintains [PITH_FULL_IMAGE:figures/full_fig_p028_6.png] view at source ↗

**Figure 7.** Figure 7: Calibration reliability diagram (NSLKDD, n=5 agents). VCG and Bayesian log-odds track the diagonal; simple averaging overestimates at low ˆp. We bin predicted probabilities into 10 bins ([0, 0.1), . . . , [0.9, 1.0]) and plot the mean predicted probability against the observed positive fraction. A perfectly calibrated system lies on the diagonal y = x. Key findings: VCG and Bayesian log-odds track the di… view at source ↗

**Figure 8.** Figure 8: VCG weight adaptation under sudden distribution shift ( [PITH_FULL_IMAGE:figures/full_fig_p031_8.png] view at source ↗

**Figure 9.** Figure 9: PoA vs. observability level kseen across (n, ρ) configurations. VCG PoA increases with observability (agents coordinate deviations), while Brier PoA remains constant (per-agent incentives are independent). I.5. Generalized Scoring Rules PoA Comparison We compare the Price of Anarchy across five scoring/aggregation rules (Brier, Log Score, Spherical, Brier+Regularization, and VCG) under best-response dynami… view at source ↗

**Figure 10.** Figure 10: Price of Anarchy across scoring rules, agent count [PITH_FULL_IMAGE:figures/full_fig_p034_10.png] view at source ↗

**Figure 11.** Figure 11: Convergence of equilibrium deviation as n grows (Brier scoring). Left axis: PoA (solid) decreases to 0 by n ≥ 50. Right axis: n · δ ∗ (dashed) grows linearly—aggregate miscalibration persists even as per-agent deviations become negligible. I.7. Online Regret Sensitivity We evaluate the multiplicative-weight online learning algorithm (Theorem 7) across learning rates η and time horizons T, measuring cumula… view at source ↗

**Figure 12.** Figure 12: Normalized regret RT / √ T under three drift scenarios. Slower learning rates (η ∗/2, blue) consistently achieve the lowest regret across all scenarios and horizons [PITH_FULL_IMAGE:figures/full_fig_p036_12.png] view at source ↗

**Figure 13.** Figure 13: Reliability diagrams for each of the three binary datasets. Each panel shows the empirical [PITH_FULL_IMAGE:figures/full_fig_p037_13.png] view at source ↗

read the original abstract

Probabilistic prediction systems often aggregate probability estimates from multiple models into a single decision. A common assumption is that if each model is individually calibrated, the aggregate prediction will also be well calibrated. We show that this assumption fails in multi-agent settings: individually calibrated predictors can become collectively miscalibrated when their predictions interact strategically, in the game-theoretic sense of Brier-optimal local response, even without deliberate coordination. This phenomenon arises naturally when agents are independently trained on overlapping data. We prove that under Brier-score-based aggregation with positively correlated beliefs, each agent's individually optimal report systematically underestimates the positive-class probability, yielding a Price of Anarchy greater than one whenever Cov(b_i, b_j) > 0. In a canonical setting (n = 5 agents, pairwise correlation = 0.5, base rate = 0.3), the empirically measured PoA in false-negative rate reaches 7.25x. In contrast, VCG-based aggregation aligns incentives by rewarding marginal contribution, achieving dominant-strategy incentive compatibility and near-optimal performance. Experiments on three real-world datasets (NSL-KDD, UNSW-NB15, Credit Card Fraud) show that VCG provides strong robustness while maintaining comparable accuracy. It performs particularly well in data-sparse and adversarial settings, and adaptive weighting further improves performance under distribution shift.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags a plausible failure mode when Brier-optimal reports from correlated predictors get averaged, but the myopic best-response derivation leaves the claimed underestimation and PoA open to the equilibrium objection.

read the letter

The main point is that individually calibrated predictors can produce a collectively miscalibrated average once agents optimize their reports for the Brier score and their private beliefs are positively correlated. The paper works out the closed-form best response under the assumption that each agent treats the others' reports as fixed, shows that this response lies below the true conditional probability when covariance is positive, and reports a Price of Anarchy of 7.25 in false-negative rate for the n=5, correlation=0.5, base-rate=0.3 case. It then contrasts this with a VCG-style aggregator that rewards marginal contribution and restores dominant-strategy incentive compatibility. Experiments on NSL-KDD, UNSW-NB15, and Credit Card Fraud data indicate that the VCG rule is more robust under sparsity and shift than plain averaging, with some further gains from adaptive weighting.

Referee Report

2 major / 2 minor

Summary. The paper claims that individually Brier-calibrated predictors become collectively miscalibrated under Brier-score aggregation when beliefs are positively correlated, because each agent’s myopic best response systematically underestimates the positive-class probability. It proves this underestimation result, shows that the resulting Price of Anarchy exceeds 1 whenever Cov(b_i, b_j) > 0, and reports an empirical PoA of 7.25× in false-negative rate for the canonical parameter set (n=5, pairwise correlation=0.5, base rate=0.3). The work contrasts this with VCG-based aggregation, which is dominant-strategy incentive compatible, and supports the claims with experiments on NSL-KDD, UNSW-NB15, and Credit Card Fraud datasets.

Significance. If the central game-theoretic result holds, the paper identifies a mechanism by which strategic local optimization can induce collective miscalibration even when every individual model is calibrated, with direct implications for ensemble methods and multi-model decision systems. The explicit PoA quantification, the VCG incentive-alignment proposal, and the three-dataset empirical evaluation are concrete strengths that would make the contribution noteworthy in the machine-learning literature.

major comments (2)

[Proof of underestimation (Section 3)] The derivation of the individually optimal report (r_i = n b_i − E[∑_{j≠i} b_j | b_i]) treats the other agents’ reports as fixed at their private beliefs b_j. In a symmetric game the reports must satisfy the fixed-point condition that the assumed r_j equal the equilibrium strategy; substituting the equilibrium strategy back into the conditional expectations changes the bias term and can eliminate or reverse the claimed systematic underestimation. This assumption is load-bearing for both the underestimation theorem and the PoA > 1 claim.
[Canonical setting and PoA measurement (Section 4)] The reported PoA of 7.25× is obtained under the myopic best-response model with a specific parameter triple (n=5, correlation=0.5, base rate=0.3). No sensitivity analysis or equilibrium-consistent re-computation is provided, so it is unclear whether the quantitative result survives the fixed-point correction required by the skeptic note.

minor comments (2)

[Experiments] The PoA figure is presented without error bars or bootstrap intervals; adding these would make the empirical claim more robust.
[Preliminaries] Notation for the aggregation rule (average of reports) and the exact Brier-score objective should be stated explicitly once at the beginning of the formal section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and insightful review. The comments highlight important distinctions between myopic best responses and full Nash equilibrium, which we address below. We outline planned revisions to clarify assumptions and strengthen the quantitative claims.

read point-by-point responses

Referee: [Proof of underestimation (Section 3)] The derivation of the individually optimal report (r_i = n b_i − E[∑_{j≠i} b_j | b_i]) treats the other agents’ reports as fixed at their private beliefs b_j. In a symmetric game the reports must satisfy the fixed-point condition that the assumed r_j equal the equilibrium strategy; substituting the equilibrium strategy back into the conditional expectations changes the bias term and can eliminate or reverse the claimed systematic underestimation. This assumption is load-bearing for both the underestimation theorem and the PoA > 1 claim.

Authors: We appreciate this observation on the modeling choice. Our analysis is explicitly framed under myopic best-response dynamics, in which each agent optimizes its report while treating others' reports as fixed at their private beliefs. This corresponds to the natural setting of independently trained models on overlapping data, where agents do not coordinate on a joint equilibrium strategy. The underestimation theorem and the resulting PoA > 1 result are derived and stated under this myopic regime, which we believe is the appropriate model for the paper's claims about collective miscalibration. We agree that a symmetric Nash equilibrium would require solving the fixed-point equations. In the revision we will add a clarifying paragraph in Section 3 that explicitly states the myopic assumption, contrasts it with full equilibrium, and notes that the directional bias from positive covariance is expected to persist (though possibly attenuated) under equilibrium play. revision: partial
Referee: [Canonical setting and PoA measurement (Section 4)] The reported PoA of 7.25× is obtained under the myopic best-response model with a specific parameter triple (n=5, correlation=0.5, base rate=0.3). No sensitivity analysis or equilibrium-consistent re-computation is provided, so it is unclear whether the quantitative result survives the fixed-point correction required by the skeptic note.

Authors: We acknowledge that the 7.25× figure is presented for a single canonical parameter set under the myopic model. In the revised manuscript we will expand Section 4 with a sensitivity analysis over ranges of n, pairwise correlation, and base rate, still under myopic best responses. In addition, we will numerically solve the symmetric fixed-point equations for the equilibrium reports at the canonical parameters and report the resulting PoA value, thereby directly addressing whether the quantitative conclusion is robust to the equilibrium correction. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation is explicit game-theoretic model with independent simulation

full rationale

The paper derives the underestimation and PoA > 1 directly from the closed-form solution to each agent's local Brier minimization treating other reports as fixed at b_j, then computes the resulting false-negative-rate ratio on explicitly chosen parameters (n=5, correlation=0.5, base rate=0.3). This is a forward derivation from stated assumptions rather than any reduction of the target quantity to a fitted input, self-citation chain, or definitional equivalence. The empirical PoA figure is a simulation output under those parameters, not a prediction forced by reusing the same data or equilibrium fixed-point. No load-bearing self-citations, ansatzes, or renamings appear in the derivation chain.

Axiom & Free-Parameter Ledger

3 free parameters · 2 axioms · 0 invented entities

The central claim rests on a game-theoretic model of strategic reporting and on specific illustrative parameter values. No new physical entities are postulated.

free parameters (3)

pairwise correlation = 0.5
Chosen value of 0.5 used to compute the canonical PoA example
base rate = 0.3
Chosen value of 0.3 used to compute the canonical PoA example
number of agents = 5
Chosen value of n=5 used to compute the canonical PoA example

axioms (2)

domain assumption Each agent independently selects the report that maximizes its expected Brier score given the fixed aggregation rule
Invoked to derive the systematic underestimation of positive-class probability
domain assumption Beliefs are positively correlated across agents
Required for the PoA to exceed one

pith-pipeline@v0.9.0 · 5771 in / 1756 out tokens · 92149 ms · 2026-05-20T20:17:09.276148+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Under the Brier score mechanism with n≥2 agents whose beliefs are correlated and outcome Pr(y=1|b1,...,bn)=1/n ∑j bj, reporting mi=bi is not the Brier-optimal strategy. The Brier-optimal report for agent i is m∗i=E[y|bi]≠bi

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

88 extracted references · 88 canonical work pages

[1]

A detailed analysis of the

Tavallaee, Mahbod and Bagheri, Ebrahim and Lu, Wei and Ghorbani, Ali A , booktitle=. A detailed analysis of the. 2009 , doi=

work page 2009
[2]

Expert Systems with Applications , volume=

Learned lessons in credit card fraud detection from a practitioner perspective , author=. Expert Systems with Applications , volume=. 2014 , doi=

work page 2014
[3]

Journal of the American Statistical Association , volume=

Strictly proper scoring rules, prediction, and estimation , author=. Journal of the American Statistical Association , volume=. 2007 , doi=

work page 2007
[4]

Advances in Neural Information Processing Systems , volume=

Truthful data acquisition via peer prediction , author=. Advances in Neural Information Processing Systems , volume=

work page
[5]

Proceedings of the 18th ACM Conference on Economics and Computation , pages=

Machine-learning aided peer prediction , author=. Proceedings of the 18th ACM Conference on Economics and Computation , pages=. 2017 , doi=

work page 2017
[6]

Management Science , volume=

Eliciting informative feedback: The peer-prediction method , author=. Management Science , volume=. 2005 , doi=

work page 2005
[7]

Proceedings of the 13th ACM Conference on Electronic Commerce , pages=

Peer prediction without a common prior , author=. Proceedings of the 13th ACM Conference on Electronic Commerce , pages=. 2012 , doi=

work page 2012
[8]

Journal of Computer and System Sciences , volume=

A decision-theoretic generalization of on-line learning and an application to boosting , author=. Journal of Computer and System Sciences , volume=. 1997 , doi=

work page 1997
[9]

Econometrica , volume=

Incentives in teams , author=. Econometrica , volume=. 1973 , doi=

work page 1973
[10]

Monthly Weather Review , volume=

Verification of forecasts expressed in terms of probability , author=. Monthly Weather Review , volume=. 1950 , doi=

work page 1950
[11]

The well-calibrated

Dawid, A Philip , journal=. The well-calibrated. 1982 , doi=

work page 1982
[12]

Journal of the Royal Statistical Society: Series D (The Statistician) , volume=

The comparison and evaluation of forecasters , author=. Journal of the Royal Statistical Society: Series D (The Statistician) , volume=. 1983 , doi=

work page 1983
[13]

Proceedings of the 34th International Conference on Machine Learning , pages=

On calibration of modern neural networks , author=. Proceedings of the 34th International Conference on Machine Learning , pages=

work page
[14]

Multiple Classifier Systems , series=

Ensemble methods in machine learning , author=. Multiple Classifier Systems , series=. 2000 , publisher=

work page 2000
[15]

2007 , publisher=

Algorithmic Game Theory , author=. 2007 , publisher=

work page 2007
[16]

npj Digital Medicine , volume=

Scalable and accurate deep learning with electronic health records , author=. npj Digital Medicine , volume=. 2018 , doi=

work page 2018
[17]

Nature , volume=

A clinically applicable approach to continuous prediction of future acute kidney injury , author=. Nature , volume=. 2019 , doi=

work page 2019
[18]

Advances in Neural Information Processing Systems , volume=

Simple and scalable predictive uncertainty estimation using deep ensembles , author=. Advances in Neural Information Processing Systems , volume=

work page
[19]

Advances in Neural Information Processing Systems , volume=

Bayesian deep learning and a probabilistic perspective of generalization , author=. Advances in Neural Information Processing Systems , volume=

work page
[20]

The Lancet Digital Health , volume=

The myth of generalisability in clinical research and machine learning in health care , author=. The Lancet Digital Health , volume=. 2020 , doi=

work page 2020
[21]

Science , volume=

Dissecting racial bias in an algorithm used to manage the health of populations , author=. Science , volume=. 2019 , doi=

work page 2019
[22]

American Journal of Cardiology , volume=

International application of a new probability algorithm for the diagnosis of coronary artery disease , author=. American Journal of Cardiology , volume=. 1989 , doi=

work page 1989
[23]

Using the

Smith, Jack W and Everhart, James E and Dickson, W C and Knowler, William C and Johannes, Robert S , booktitle=. Using the

work page
[24]

Advances in Neural Information Processing Systems , volume=

Deep sets , author=. Advances in Neural Information Processing Systems , volume=

work page
[25]

Proceedings of the 4th International Conference on Information Systems Security and Privacy , pages=

Toward generating a new intrusion detection dataset and intrusion traffic characterization , author=. Proceedings of the 4th International Conference on Information Systems Security and Privacy , pages=. 2018 , doi=

work page 2018
[26]

2015 , doi=

Moustafa, Nour and Slay, Jill , booktitle=. 2015 , doi=

work page 2015
[27]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Arik, Sercan. Proceedings of the AAAI Conference on Artificial Intelligence , volume=. 2021 , doi=

work page 2021
[28]

Statistical Science , volume=

Combining probability distributions: A critique and an annotated bibliography , author=. Statistical Science , volume=. 1986 , doi=

work page 1986
[29]

The Journal of Finance , volume=

Counterspeculation, auctions, and competitive sealed tenders , author=. The Journal of Finance , volume=. 1961 , doi=

work page 1961
[30]

Public Choice , volume=

Multipart pricing of public goods , author=. Public Choice , volume=. 1971 , doi=

work page 1971
[31]

Annual Symposium on Theoretical Aspects of Computer Science , series=

Worst-case equilibria , author=. Annual Symposium on Theoretical Aspects of Computer Science , series=. 1999 , publisher=

work page 1999
[32]

Machine learning with adversaries:

Blanchard, Peva and El Mhamdi, El Mahdi and Guerraoui, Rachid and Stainer, Julien , booktitle=. Machine learning with adversaries:

work page
[33]

Proceedings of the 35th International Conference on Machine Learning , pages=

Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , author=. Proceedings of the 35th International Conference on Machine Learning , pages=

work page
[34]

Proceedings of the 20th International Conference on Artificial Intelligence and Statistics , pages=

Communication-Efficient Learning of Deep Networks from Decentralized Data , author=. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics , pages=

work page
[35]

Karimireddy, Sai Praneeth and Kale, Satyen and Mohri, Mehryar and Reddi, Sashank and Stich, Sebastian and Suresh, Ananda Theertha , booktitle=

work page
[36]

2006 , publisher=

Prediction, Learning, and Games , author=. 2006 , publisher=

work page 2006
[37]

Contributions to the Theory of Games , editor=

A value for n-person games , author=. Contributions to the Theory of Games , editor=. 1953 , publisher=

work page 1953
[38]

Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume=

Combining probability forecasts , author=. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume=. 2010 , doi=

work page 2010
[39]

Maximum likelihood estimation of observer error-rates using the

Dawid, A Philip and Skene, Allan M , journal=. Maximum likelihood estimation of observer error-rates using the. 1979 , doi=

work page 1979
[40]

Journal of the ACM , volume=

Intrinsic Robustness of the Price of Anarchy , author=. Journal of the ACM , volume=. 2015 , doi=

work page 2015
[41]

Can You Trust Your Model's Uncertainty?

Ovadia, Yaniv and Fertig, Emily and Ren, Jie and Nado, Zachary and Sculley, D and Nowozin, Sebastian and Dillon, Joshua V and Lakshminarayanan, Balaji and Snoek, Jasper , booktitle=. Can You Trust Your Model's Uncertainty?

work page
[42]

IEEE Signal Processing Magazine , volume=

Federated Learning: Challenges, Methods, and Future Directions , author=. IEEE Signal Processing Magazine , volume=. 2020 , doi=

work page 2020
[43]

International Conference on Machine Learning , pages=

Online Learning under Delayed Feedback , author=. International Conference on Machine Learning , pages=

work page
[44]

RAND Memorandum RM-2651 , year=

Values of Large Games, IV: Evaluating the Electoral College by Montecarlo Techniques , author=. RAND Memorandum RM-2651 , year=

work page
[45]

TabNet: Attentive Interpretable Tabular Learning.,

Sercan \"O Arik and Tomas Pfister. TabNet : Attentive interpretable tabular learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 6679--6687, 2021. doi:10.1609/aaai.v35i8.16826

work page doi:10.1609/aaai.v35i8.16826 2021
[46]

Machine learning with adversaries: Byzantine tolerant gradient descent

Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. Machine learning with adversaries: Byzantine tolerant gradient descent. In Advances in Neural Information Processing Systems, volume 30, 2017

work page 2017
[47]

Verification of forecasts expressed in terms of probability

Glenn W Brier. Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78 0 (1): 0 1--3, 1950. doi:10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2

work page doi:10.1175/1520-0493(1950)078 1950
[48]

Prediction, Learning, and Games

Nicol \`o Cesa-Bianchi and G \'a bor Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006. doi:10.1017/CBO9780511546921

work page doi:10.1017/cbo9780511546921 2006
[49]

Truthful data acquisition via peer prediction

Yiling Chen, Yiheng Shen, and Shuran Zheng. Truthful data acquisition via peer prediction. In Advances in Neural Information Processing Systems, volume 33, pages 18879--18889, 2020

work page 2020
[50]

Multipart pricing of public goods

Edward H Clarke. Multipart pricing of public goods. Public Choice, 11 0 (1): 0 17--33, 1971. doi:10.1007/BF01726210

work page doi:10.1007/bf01726210 1971
[51]

Learned lessons in credit card fraud detection from a practitioner perspective

Andrea Dal Pozzolo, Olivier Caelen, Yann-Ael Le Borgne, Serge Waterschoot, and Gianluca Bontempi. Learned lessons in credit card fraud detection from a practitioner perspective. Expert Systems with Applications, 41 0 (10): 0 4915--4928, 2014. doi:10.1016/j.eswa.2014.02.026

work page doi:10.1016/j.eswa.2014.02.026 2014
[52]

The well-calibrated B ayesian

A Philip Dawid. The well-calibrated B ayesian. Journal of the American Statistical Association, 77 0 (379): 0 605--610, 1982. doi:10.1080/01621459.1982.10477856

work page doi:10.1080/01621459.1982.10477856 1982
[53]

Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm

A Philip Dawid and Allan M Skene. Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), 28 0 (1): 0 20--28, 1979. doi:10.2307/2346806

work page doi:10.2307/2346806 1979
[54]

DeGroot and Stephen E

Morris H DeGroot and Stephen E Fienberg. The comparison and evaluation of forecasters. Journal of the Royal Statistical Society: Series D (The Statistician), 32 0 (1-2): 0 12--22, 1983. doi:10.2307/2987588

work page doi:10.2307/2987588 1983
[55]

International application of a new probability algorithm for the diagnosis of coronary artery disease

Robert Detrano, Ales Jan s a, Walter Steinbrunn, Matthias Pfisterer, Johann-Jakob Schmid, Sarbjit Sandhu, Kern H Guppy, Stella Lee, and Victor Froelicher. International application of a new probability algorithm for the diagnosis of coronary artery disease. American Journal of Cardiology, 64 0 (5): 0 304--310, 1989. doi:10.1016/0002-9149(89)90524-9

work page doi:10.1016/0002-9149(89)90524-9 1989
[56]

Ensemble methods in machine learning

Thomas G Dietterich. Ensemble methods in machine learning. In Multiple Classifier Systems, Lecture Notes in Computer Science, pages 1--15. Springer, 2000. doi:10.1007/3-540-45014-9_1

work page doi:10.1007/3-540-45014-9_1 2000
[57]

Freund, R

Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55 0 (1): 0 119--139, 1997. doi:10.1006/jcss.1997.1504

work page doi:10.1006/jcss.1997.1504 1997
[58]

The myth of generalisability in clinical research and machine learning in health care

Joseph Futoma, Morgan Siber, and Jonathan A Quinn. The myth of generalisability in clinical research and machine learning in health care. The Lancet Digital Health, 2 0 (9): 0 e489--e492, 2020. doi:10.1016/S2589-7500(20)30186-2

work page doi:10.1016/s2589-7500(20)30186-2 2020
[59]

Hastie and R

Christian Genest and James V Zidek. Combining probability distributions: A critique and an annotated bibliography. Statistical Science, 1 0 (1): 0 114--135, 1986. doi:10.1214/ss/1177013825

work page doi:10.1214/ss/1177013825 1986
[60]

Strictly

Tilmann Gneiting and Adrian E Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102 0 (477): 0 359--378, 2007. doi:10.1198/016214506000001437

work page doi:10.1198/016214506000001437 2007
[61]

Incentives in teams

Theodore Groves. Incentives in teams. Econometrica, 41 0 (4): 0 617--631, 1973. doi:10.2307/1914085

work page doi:10.2307/1914085 1973
[62]

On calibration of modern neural networks

Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning, pages 1321--1330, 2017

work page 2017
[63]

Online learning under delayed feedback

Pooria Joulani, Andras Gyorgy, and Csaba Szepesvari. Online learning under delayed feedback. In International Conference on Machine Learning, pages 1453--1461, 2013

work page 2013
[64]

SCAFFOLD : Stochastic controlled averaging for federated learning

Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian Stich, and Ananda Theertha Suresh. SCAFFOLD : Stochastic controlled averaging for federated learning. In Proceedings of the 37th International Conference on Machine Learning, pages 5132--5143, 2020

work page 2020
[65]

Worst-case equilibria

Elias Koutsoupias and Christos Papadimitriou. Worst-case equilibria. In Annual Symposium on Theoretical Aspects of Computer Science, volume 1563 of Lecture Notes in Computer Science, pages 404--413. Springer, 1999. doi:10.1007/3-540-49116-3_38

work page doi:10.1007/3-540-49116-3_38 1999
[66]

Simple and scalable predictive uncertainty estimation using deep ensembles

Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems, volume 30, 2017

work page 2017
[67]

Federated Learn- ing: Challenges, Methods, and Future Directions,

Tian Li, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37 0 (3): 0 50--60, 2020. doi:10.1109/MSP.2020.2975749

work page doi:10.1109/msp.2020.2975749 2020
[68]

Machine-learning aided peer prediction

Yang Liu and Yiling Chen. Machine-learning aided peer prediction. In Proceedings of the 18th ACM Conference on Economics and Computation, pages 63--80, 2017. doi:10.1145/3033274.3085126

work page doi:10.1145/3033274.3085126 2017
[69]

Irwin Mann and Lloyd S. Shapley. Values of large games, iv: Evaluating the electoral college by montecarlo techniques. RAND Memorandum RM-2651, 1960

work page 1960
[70]

Communication-efficient learning of deep networks from decentralized data

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Ag \"u era y Arcas. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, pages 1273--1282, 2017

work page 2017
[71]

Eliciting informative feedback: The peer-prediction method

Nolan Miller, Paul Resnick, and Richard Zeckhauser. Eliciting informative feedback: The peer-prediction method. Management Science, 51 0 (9): 0 1359--1373, 2005. doi:10.1287/mnsc.1050.0379

work page doi:10.1287/mnsc.1050.0379 2005
[72]

UNSW-NB15 : A comprehensive data set for network intrusion detection systems ( UNSW-NB15 network data set)

Nour Moustafa and Jill Slay. UNSW-NB15 : A comprehensive data set for network intrusion detection systems ( UNSW-NB15 network data set). In Military Communications and Information Systems Conference, pages 1--6, 2015. doi:10.1109/MilCIS.2015.7348942

work page doi:10.1109/milcis.2015.7348942 2015
[73]

Algorithmic Game Theory

Noam Nisan, Tim Roughgarden, Eva Tardos, and Vijay V Vazirani. Algorithmic Game Theory. Cambridge University Press, 2007. doi:10.1017/CBO9780511800481

work page doi:10.1017/cbo9780511800481 2007
[74]

Science , author =

Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366 0 (6464): 0 447--453, 2019. doi:10.1126/science.aax2342

work page doi:10.1126/science.aax2342 2019
[75]

Can you trust your model's uncertainty? Evaluating predictive uncertainty under dataset shift

Yaniv Ovadia, Emily Fertig, Jie Ren, Zachary Nado, D Sculley, Sebastian Nowozin, Joshua V Dillon, Balaji Lakshminarayanan, and Jasper Snoek. Can you trust your model's uncertainty? Evaluating predictive uncertainty under dataset shift. In Advances in Neural Information Processing Systems, volume 32, 2019

work page 2019
[76]

Scalable and accurate deep learning with electronic health records,

Alvin Rajkomar, Eyal Oren, Kai Chen, Andrew M Dai, Nissan Hajaj, Michaela Hardt, Peter J Liu, Xiaobing Liu, Jake Marcus, Mimi Sun, et al. Scalable and accurate deep learning with electronic health records. npj Digital Medicine, 1 0 (1): 0 18, 2018. doi:10.1038/s41746-018-0029-1

work page doi:10.1038/s41746-018-0029-1 2018
[77]

Combining probability forecasts

Roopesh Ranjan and Tilmann Gneiting. Combining probability forecasts. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72 0 (1): 0 71--91, 2010. doi:10.1111/j.1467-9868.2009.00726.x

work page doi:10.1111/j.1467-9868.2009.00726.x 2010
[78]

Intrinsic robustness of the price of ana rchy

Tim Roughgarden. Intrinsic robustness of the price of anarchy. Journal of the ACM, 62 0 (5): 0 1--42, 2015. doi:10.1145/2806883

work page doi:10.1145/2806883 2015
[79]

A value for n-person games

Lloyd S Shapley. A value for n-person games. In Harold W Kuhn and Albert W Tucker, editors, Contributions to the Theory of Games, volume 2, pages 307--317. Princeton University Press, 1953. doi:10.1515/9781400881970-018

work page doi:10.1515/9781400881970-018 1953
[80]

Toward generating a new intrusion detection dataset and intrusion traffic characterization

Iman Sharafaldin, Arash Habibi Lashkari, and Ali A Ghorbani. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy, pages 108--116, 2018. doi:10.5220/0006639801080116

work page doi:10.5220/0006639801080116 2018

Showing first 80 references.

[1] [1]

A detailed analysis of the

Tavallaee, Mahbod and Bagheri, Ebrahim and Lu, Wei and Ghorbani, Ali A , booktitle=. A detailed analysis of the. 2009 , doi=

work page 2009

[2] [2]

Expert Systems with Applications , volume=

Learned lessons in credit card fraud detection from a practitioner perspective , author=. Expert Systems with Applications , volume=. 2014 , doi=

work page 2014

[3] [3]

Journal of the American Statistical Association , volume=

Strictly proper scoring rules, prediction, and estimation , author=. Journal of the American Statistical Association , volume=. 2007 , doi=

work page 2007

[4] [4]

Advances in Neural Information Processing Systems , volume=

Truthful data acquisition via peer prediction , author=. Advances in Neural Information Processing Systems , volume=

work page

[5] [5]

Proceedings of the 18th ACM Conference on Economics and Computation , pages=

Machine-learning aided peer prediction , author=. Proceedings of the 18th ACM Conference on Economics and Computation , pages=. 2017 , doi=

work page 2017

[6] [6]

Management Science , volume=

Eliciting informative feedback: The peer-prediction method , author=. Management Science , volume=. 2005 , doi=

work page 2005

[7] [7]

Proceedings of the 13th ACM Conference on Electronic Commerce , pages=

Peer prediction without a common prior , author=. Proceedings of the 13th ACM Conference on Electronic Commerce , pages=. 2012 , doi=

work page 2012

[8] [8]

Journal of Computer and System Sciences , volume=

A decision-theoretic generalization of on-line learning and an application to boosting , author=. Journal of Computer and System Sciences , volume=. 1997 , doi=

work page 1997

[9] [9]

Econometrica , volume=

Incentives in teams , author=. Econometrica , volume=. 1973 , doi=

work page 1973

[10] [10]

Monthly Weather Review , volume=

Verification of forecasts expressed in terms of probability , author=. Monthly Weather Review , volume=. 1950 , doi=

work page 1950

[11] [11]

The well-calibrated

Dawid, A Philip , journal=. The well-calibrated. 1982 , doi=

work page 1982

[12] [12]

Journal of the Royal Statistical Society: Series D (The Statistician) , volume=

The comparison and evaluation of forecasters , author=. Journal of the Royal Statistical Society: Series D (The Statistician) , volume=. 1983 , doi=

work page 1983

[13] [13]

Proceedings of the 34th International Conference on Machine Learning , pages=

On calibration of modern neural networks , author=. Proceedings of the 34th International Conference on Machine Learning , pages=

work page

[14] [14]

Multiple Classifier Systems , series=

Ensemble methods in machine learning , author=. Multiple Classifier Systems , series=. 2000 , publisher=

work page 2000

[15] [15]

2007 , publisher=

Algorithmic Game Theory , author=. 2007 , publisher=

work page 2007

[16] [16]

npj Digital Medicine , volume=

Scalable and accurate deep learning with electronic health records , author=. npj Digital Medicine , volume=. 2018 , doi=

work page 2018

[17] [17]

Nature , volume=

A clinically applicable approach to continuous prediction of future acute kidney injury , author=. Nature , volume=. 2019 , doi=

work page 2019

[18] [18]

Advances in Neural Information Processing Systems , volume=

Simple and scalable predictive uncertainty estimation using deep ensembles , author=. Advances in Neural Information Processing Systems , volume=

work page

[19] [19]

Advances in Neural Information Processing Systems , volume=

Bayesian deep learning and a probabilistic perspective of generalization , author=. Advances in Neural Information Processing Systems , volume=

work page

[20] [20]

The Lancet Digital Health , volume=

The myth of generalisability in clinical research and machine learning in health care , author=. The Lancet Digital Health , volume=. 2020 , doi=

work page 2020

[21] [21]

Science , volume=

Dissecting racial bias in an algorithm used to manage the health of populations , author=. Science , volume=. 2019 , doi=

work page 2019

[22] [22]

American Journal of Cardiology , volume=

International application of a new probability algorithm for the diagnosis of coronary artery disease , author=. American Journal of Cardiology , volume=. 1989 , doi=

work page 1989

[23] [23]

Using the

Smith, Jack W and Everhart, James E and Dickson, W C and Knowler, William C and Johannes, Robert S , booktitle=. Using the

work page

[24] [24]

Advances in Neural Information Processing Systems , volume=

Deep sets , author=. Advances in Neural Information Processing Systems , volume=

work page

[25] [25]

Proceedings of the 4th International Conference on Information Systems Security and Privacy , pages=

Toward generating a new intrusion detection dataset and intrusion traffic characterization , author=. Proceedings of the 4th International Conference on Information Systems Security and Privacy , pages=. 2018 , doi=

work page 2018

[26] [26]

2015 , doi=

Moustafa, Nour and Slay, Jill , booktitle=. 2015 , doi=

work page 2015

[27] [27]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Arik, Sercan. Proceedings of the AAAI Conference on Artificial Intelligence , volume=. 2021 , doi=

work page 2021

[28] [28]

Statistical Science , volume=

Combining probability distributions: A critique and an annotated bibliography , author=. Statistical Science , volume=. 1986 , doi=

work page 1986

[29] [29]

The Journal of Finance , volume=

Counterspeculation, auctions, and competitive sealed tenders , author=. The Journal of Finance , volume=. 1961 , doi=

work page 1961

[30] [30]

Public Choice , volume=

Multipart pricing of public goods , author=. Public Choice , volume=. 1971 , doi=

work page 1971

[31] [31]

Annual Symposium on Theoretical Aspects of Computer Science , series=

Worst-case equilibria , author=. Annual Symposium on Theoretical Aspects of Computer Science , series=. 1999 , publisher=

work page 1999

[32] [32]

Machine learning with adversaries:

Blanchard, Peva and El Mhamdi, El Mahdi and Guerraoui, Rachid and Stainer, Julien , booktitle=. Machine learning with adversaries:

work page

[33] [33]

Proceedings of the 35th International Conference on Machine Learning , pages=

Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , author=. Proceedings of the 35th International Conference on Machine Learning , pages=

work page

[34] [34]

Proceedings of the 20th International Conference on Artificial Intelligence and Statistics , pages=

Communication-Efficient Learning of Deep Networks from Decentralized Data , author=. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics , pages=

work page

[35] [35]

Karimireddy, Sai Praneeth and Kale, Satyen and Mohri, Mehryar and Reddi, Sashank and Stich, Sebastian and Suresh, Ananda Theertha , booktitle=

work page

[36] [36]

2006 , publisher=

Prediction, Learning, and Games , author=. 2006 , publisher=

work page 2006

[37] [37]

Contributions to the Theory of Games , editor=

A value for n-person games , author=. Contributions to the Theory of Games , editor=. 1953 , publisher=

work page 1953

[38] [38]

Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume=

Combining probability forecasts , author=. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume=. 2010 , doi=

work page 2010

[39] [39]

Maximum likelihood estimation of observer error-rates using the

Dawid, A Philip and Skene, Allan M , journal=. Maximum likelihood estimation of observer error-rates using the. 1979 , doi=

work page 1979

[40] [40]

Journal of the ACM , volume=

Intrinsic Robustness of the Price of Anarchy , author=. Journal of the ACM , volume=. 2015 , doi=

work page 2015

[41] [41]

Can You Trust Your Model's Uncertainty?

Ovadia, Yaniv and Fertig, Emily and Ren, Jie and Nado, Zachary and Sculley, D and Nowozin, Sebastian and Dillon, Joshua V and Lakshminarayanan, Balaji and Snoek, Jasper , booktitle=. Can You Trust Your Model's Uncertainty?

work page

[42] [42]

IEEE Signal Processing Magazine , volume=

Federated Learning: Challenges, Methods, and Future Directions , author=. IEEE Signal Processing Magazine , volume=. 2020 , doi=

work page 2020

[43] [43]

International Conference on Machine Learning , pages=

Online Learning under Delayed Feedback , author=. International Conference on Machine Learning , pages=

work page

[44] [44]

RAND Memorandum RM-2651 , year=

Values of Large Games, IV: Evaluating the Electoral College by Montecarlo Techniques , author=. RAND Memorandum RM-2651 , year=

work page

[45] [45]

TabNet: Attentive Interpretable Tabular Learning.,

Sercan \"O Arik and Tomas Pfister. TabNet : Attentive interpretable tabular learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 6679--6687, 2021. doi:10.1609/aaai.v35i8.16826

work page doi:10.1609/aaai.v35i8.16826 2021

[46] [46]

Machine learning with adversaries: Byzantine tolerant gradient descent

Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. Machine learning with adversaries: Byzantine tolerant gradient descent. In Advances in Neural Information Processing Systems, volume 30, 2017

work page 2017

[47] [47]

Verification of forecasts expressed in terms of probability

Glenn W Brier. Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78 0 (1): 0 1--3, 1950. doi:10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2

work page doi:10.1175/1520-0493(1950)078 1950

[48] [48]

Prediction, Learning, and Games

Nicol \`o Cesa-Bianchi and G \'a bor Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006. doi:10.1017/CBO9780511546921

work page doi:10.1017/cbo9780511546921 2006

[49] [49]

Truthful data acquisition via peer prediction

Yiling Chen, Yiheng Shen, and Shuran Zheng. Truthful data acquisition via peer prediction. In Advances in Neural Information Processing Systems, volume 33, pages 18879--18889, 2020

work page 2020

[50] [50]

Multipart pricing of public goods

Edward H Clarke. Multipart pricing of public goods. Public Choice, 11 0 (1): 0 17--33, 1971. doi:10.1007/BF01726210

work page doi:10.1007/bf01726210 1971

[51] [51]

Learned lessons in credit card fraud detection from a practitioner perspective

Andrea Dal Pozzolo, Olivier Caelen, Yann-Ael Le Borgne, Serge Waterschoot, and Gianluca Bontempi. Learned lessons in credit card fraud detection from a practitioner perspective. Expert Systems with Applications, 41 0 (10): 0 4915--4928, 2014. doi:10.1016/j.eswa.2014.02.026

work page doi:10.1016/j.eswa.2014.02.026 2014

[52] [52]

The well-calibrated B ayesian

A Philip Dawid. The well-calibrated B ayesian. Journal of the American Statistical Association, 77 0 (379): 0 605--610, 1982. doi:10.1080/01621459.1982.10477856

work page doi:10.1080/01621459.1982.10477856 1982

[53] [53]

Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm

A Philip Dawid and Allan M Skene. Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), 28 0 (1): 0 20--28, 1979. doi:10.2307/2346806

work page doi:10.2307/2346806 1979

[54] [54]

DeGroot and Stephen E

Morris H DeGroot and Stephen E Fienberg. The comparison and evaluation of forecasters. Journal of the Royal Statistical Society: Series D (The Statistician), 32 0 (1-2): 0 12--22, 1983. doi:10.2307/2987588

work page doi:10.2307/2987588 1983

[55] [55]

International application of a new probability algorithm for the diagnosis of coronary artery disease

Robert Detrano, Ales Jan s a, Walter Steinbrunn, Matthias Pfisterer, Johann-Jakob Schmid, Sarbjit Sandhu, Kern H Guppy, Stella Lee, and Victor Froelicher. International application of a new probability algorithm for the diagnosis of coronary artery disease. American Journal of Cardiology, 64 0 (5): 0 304--310, 1989. doi:10.1016/0002-9149(89)90524-9

work page doi:10.1016/0002-9149(89)90524-9 1989

[56] [56]

Ensemble methods in machine learning

Thomas G Dietterich. Ensemble methods in machine learning. In Multiple Classifier Systems, Lecture Notes in Computer Science, pages 1--15. Springer, 2000. doi:10.1007/3-540-45014-9_1

work page doi:10.1007/3-540-45014-9_1 2000

[57] [57]

Freund, R

Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55 0 (1): 0 119--139, 1997. doi:10.1006/jcss.1997.1504

work page doi:10.1006/jcss.1997.1504 1997

[58] [58]

The myth of generalisability in clinical research and machine learning in health care

Joseph Futoma, Morgan Siber, and Jonathan A Quinn. The myth of generalisability in clinical research and machine learning in health care. The Lancet Digital Health, 2 0 (9): 0 e489--e492, 2020. doi:10.1016/S2589-7500(20)30186-2

work page doi:10.1016/s2589-7500(20)30186-2 2020

[59] [59]

Hastie and R

Christian Genest and James V Zidek. Combining probability distributions: A critique and an annotated bibliography. Statistical Science, 1 0 (1): 0 114--135, 1986. doi:10.1214/ss/1177013825

work page doi:10.1214/ss/1177013825 1986

[60] [60]

Strictly

Tilmann Gneiting and Adrian E Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102 0 (477): 0 359--378, 2007. doi:10.1198/016214506000001437

work page doi:10.1198/016214506000001437 2007

[61] [61]

Incentives in teams

Theodore Groves. Incentives in teams. Econometrica, 41 0 (4): 0 617--631, 1973. doi:10.2307/1914085

work page doi:10.2307/1914085 1973

[62] [62]

On calibration of modern neural networks

Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning, pages 1321--1330, 2017

work page 2017

[63] [63]

Online learning under delayed feedback

Pooria Joulani, Andras Gyorgy, and Csaba Szepesvari. Online learning under delayed feedback. In International Conference on Machine Learning, pages 1453--1461, 2013

work page 2013

[64] [64]

SCAFFOLD : Stochastic controlled averaging for federated learning

Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian Stich, and Ananda Theertha Suresh. SCAFFOLD : Stochastic controlled averaging for federated learning. In Proceedings of the 37th International Conference on Machine Learning, pages 5132--5143, 2020

work page 2020

[65] [65]

Worst-case equilibria

Elias Koutsoupias and Christos Papadimitriou. Worst-case equilibria. In Annual Symposium on Theoretical Aspects of Computer Science, volume 1563 of Lecture Notes in Computer Science, pages 404--413. Springer, 1999. doi:10.1007/3-540-49116-3_38

work page doi:10.1007/3-540-49116-3_38 1999

[66] [66]

Simple and scalable predictive uncertainty estimation using deep ensembles

Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems, volume 30, 2017

work page 2017

[67] [67]

Federated Learn- ing: Challenges, Methods, and Future Directions,

Tian Li, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37 0 (3): 0 50--60, 2020. doi:10.1109/MSP.2020.2975749

work page doi:10.1109/msp.2020.2975749 2020

[68] [68]

Machine-learning aided peer prediction

Yang Liu and Yiling Chen. Machine-learning aided peer prediction. In Proceedings of the 18th ACM Conference on Economics and Computation, pages 63--80, 2017. doi:10.1145/3033274.3085126

work page doi:10.1145/3033274.3085126 2017

[69] [69]

Irwin Mann and Lloyd S. Shapley. Values of large games, iv: Evaluating the electoral college by montecarlo techniques. RAND Memorandum RM-2651, 1960

work page 1960

[70] [70]

Communication-efficient learning of deep networks from decentralized data

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Ag \"u era y Arcas. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, pages 1273--1282, 2017

work page 2017

[71] [71]

Eliciting informative feedback: The peer-prediction method

Nolan Miller, Paul Resnick, and Richard Zeckhauser. Eliciting informative feedback: The peer-prediction method. Management Science, 51 0 (9): 0 1359--1373, 2005. doi:10.1287/mnsc.1050.0379

work page doi:10.1287/mnsc.1050.0379 2005

[72] [72]

UNSW-NB15 : A comprehensive data set for network intrusion detection systems ( UNSW-NB15 network data set)

Nour Moustafa and Jill Slay. UNSW-NB15 : A comprehensive data set for network intrusion detection systems ( UNSW-NB15 network data set). In Military Communications and Information Systems Conference, pages 1--6, 2015. doi:10.1109/MilCIS.2015.7348942

work page doi:10.1109/milcis.2015.7348942 2015

[73] [73]

Algorithmic Game Theory

Noam Nisan, Tim Roughgarden, Eva Tardos, and Vijay V Vazirani. Algorithmic Game Theory. Cambridge University Press, 2007. doi:10.1017/CBO9780511800481

work page doi:10.1017/cbo9780511800481 2007

[74] [74]

Science , author =

Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366 0 (6464): 0 447--453, 2019. doi:10.1126/science.aax2342

work page doi:10.1126/science.aax2342 2019

[75] [75]

Can you trust your model's uncertainty? Evaluating predictive uncertainty under dataset shift

Yaniv Ovadia, Emily Fertig, Jie Ren, Zachary Nado, D Sculley, Sebastian Nowozin, Joshua V Dillon, Balaji Lakshminarayanan, and Jasper Snoek. Can you trust your model's uncertainty? Evaluating predictive uncertainty under dataset shift. In Advances in Neural Information Processing Systems, volume 32, 2019

work page 2019

[76] [76]

Scalable and accurate deep learning with electronic health records,

Alvin Rajkomar, Eyal Oren, Kai Chen, Andrew M Dai, Nissan Hajaj, Michaela Hardt, Peter J Liu, Xiaobing Liu, Jake Marcus, Mimi Sun, et al. Scalable and accurate deep learning with electronic health records. npj Digital Medicine, 1 0 (1): 0 18, 2018. doi:10.1038/s41746-018-0029-1

work page doi:10.1038/s41746-018-0029-1 2018

[77] [77]

Combining probability forecasts

Roopesh Ranjan and Tilmann Gneiting. Combining probability forecasts. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72 0 (1): 0 71--91, 2010. doi:10.1111/j.1467-9868.2009.00726.x

work page doi:10.1111/j.1467-9868.2009.00726.x 2010

[78] [78]

Intrinsic robustness of the price of ana rchy

Tim Roughgarden. Intrinsic robustness of the price of anarchy. Journal of the ACM, 62 0 (5): 0 1--42, 2015. doi:10.1145/2806883

work page doi:10.1145/2806883 2015

[79] [79]

A value for n-person games

Lloyd S Shapley. A value for n-person games. In Harold W Kuhn and Albert W Tucker, editors, Contributions to the Theory of Games, volume 2, pages 307--317. Princeton University Press, 1953. doi:10.1515/9781400881970-018

work page doi:10.1515/9781400881970-018 1953

[80] [80]

Toward generating a new intrusion detection dataset and intrusion traffic characterization

Iman Sharafaldin, Arash Habibi Lashkari, and Ali A Ghorbani. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy, pages 108--116, 2018. doi:10.5220/0006639801080116

work page doi:10.5220/0006639801080116 2018