pith. sign in

arxiv: 2604.03073 · v1 · submitted 2026-04-03 · 📊 stat.ME

Modeling within-department homogeneity in research quality rankings: an application to the Italian ISPD

Pith reviewed 2026-05-13 18:28 UTC · model grok-4.3

classification 📊 stat.ME
keywords ISPDdepartment rankingintra-departmental correlationadjusted indexBetoidal distributionmaximum likelihood estimationresearch qualityItaly
0
0 comments X

The pith

Modeling intra-department correlation by size yields an adjusted ISPD that ranks Italian departments more fairly than the original index.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models how standardized research scores tend to be more alike within smaller Italian academic departments, treating this homogeneity as a parametric function of department size. Maximum-likelihood estimation of the resulting model produces an adjusted version of the Indice Standardizzato di Performance Dipartimentale that reduces the polarization observed in current rankings. A new Betoidal distribution is introduced to accommodate the rounding and left-truncation present in publicly released data. Fits to the 2017 and 2022 Italian evaluation rounds support the size-dependent correlation structure. Simulations further indicate that the adjusted index recovers underlying department quality more accurately than the unadjusted ISPD or other data-intensive alternatives.

Core claim

The presence of within-department homogeneity among standardized scores, modeled as a function of department size, leads to a new adjusted ISPD that provides fairer rankings, with the Betoidal distribution enabling estimation from coarsened public data.

What carries the argument

Parametric model of intra-departmental correlation as a function of size, estimated by maximum likelihood and paired with the Betoidal distribution for rounded and truncated observations.

Load-bearing premise

Intra-departmental correlation among scores depends only on department size through a specific parametric form, and the Betoidal distribution accurately captures the rounding and truncation in the public data.

What would settle it

A simulation in which true department qualities are known but the size-dependent correlation is misspecified, or real data in which the adjusted index fails to improve recovery of held-out quality measures relative to the original ISPD.

Figures

Figures reproduced from arXiv: 2604.03073 by Giorgio E. Montanari, Marco Doretti.

Figure 1
Figure 1. Figure 1: Distribution of ISPDs for ANVUR’s (a) 2017 ranking exercise (766 departments) and (b) 2022 exercise [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Two settings with ρA = 0.05 = ρB, NA = 75, NB = 150, and (a) zA = 2 = zB, and (b) zA = −2 = zB. In (a), ISPDA = 97.72% = ISPDB, with the correct performance measures being P(ZA ≤ 2) = 82.19% and P(ZB ≤ 2) = 75.43%. In (b), ISPDA = 2.28% = ISPDB, with P(ZA ≤ −2) = 17.81% and P(ZB ≤ −2) = 24.57%. different sizes, scaled averages should be standardized so that they can be interpreted as realizations from a co… view at source ↗
Figure 3
Figure 3. Figure 3: (a) Betoidal density for various levels of [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Density of X ∼ Betoidal(σ) and Y ∼ Beta(a, a) with V (X) = V (Y ) and (a) σ = 0.5 (a = 3.4005), and (b) σ = 2.5 (a = 0.2568). When only the upper tail of the ISPD values is observed, as in the 2022 case, it is necessary to consider a left-truncated version of the distribution. To this end, we indicate by X⋆ ∼ LT-Betoidal(σ, x⋆ ) a Betoidal random variable truncated below a given value x ⋆ . Its support is … view at source ↗
Figure 5
Figure 5. Figure 5: Empirical histograms of (a) ISPD and (b) ISPD [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Empirical histograms of scaled averages (divided by the corresponding theoretical standard deviation) for [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
read the original abstract

In this paper, we consider the academic department ranking system of Italy, which is based on a performance index named Indice Standardizzato di Performance Dipartimentale (ISPD). While critiques to the ISPD have been moved for its marked tendency to polarization, we here formalize a yet unexplored determinant of this phenomenon, that is, the presence of within-department homogeneity among the standardized scores used to build the index. We account for this intra-departmental correlation by modeling it as a function of departments' size. The proposed model, estimated via Maximum Likelihood, allows to build a fairer ranking procedure via the definition of a properly adjusted version of the ISPD. The estimation framework is also adapted to fit publicly available data, which are coarsened by rounding and/or left-truncated. To this end, a novel probability distribution termed Betoidal is introduced. Empirical evidence in favor of the proposed model is found in the 2017 and 2022 data. Moreover, a simulation study shows that the adjusted index significantly overcomes not only the original ISPD, but also other more data-demanding competing proposals.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper proposes modeling within-department homogeneity in standardized research performance scores as a parametric function of department size. It introduces a novel 'Betoidal' distribution to handle rounding and left-truncation in publicly available ISPD data, estimates the model by maximum likelihood, and defines an adjusted ISPD intended to produce fairer department rankings. Empirical fits are reported on 2017 and 2022 Italian data, and a simulation study claims the adjusted index outperforms the original ISPD and other competing proposals.

Significance. If the modeling assumptions prove robust, the work supplies a statistically principled correction for size-induced polarization bias in research rankings. The adaptation of the likelihood to coarsened data via the Betoidal distribution addresses a practical constraint of public indices, and the simulation evidence of improved ranking performance could inform policy adjustments in academic evaluation systems.

major comments (3)
  1. [§3] §3 (Betoidal distribution): the novel distribution is introduced to encode rounding and left-truncation, yet no comparison to a standard truncated-beta likelihood, no residual diagnostics, and no sensitivity checks to the rounding mechanism are provided; this directly affects the reliability of the ML parameter estimates used for the adjusted ISPD.
  2. [§5] §5 (Empirical application): the adjusted ISPD is constructed from parameters estimated on the identical 2017 and 2022 datasets used to evaluate its 'fairer' property, so the reported improvement is internal to the fitted model rather than benchmarked against an external criterion.
  3. [§6] §6 (Simulation study): superiority is demonstrated under data generated from the proposed model; no robustness results are shown when the size-dependent correlation function is misspecified or when intra-department homogeneity follows a different structure, which is central to the claim that the adjustment removes polarization bias.
minor comments (3)
  1. Notation for the size-dependent correlation parameters should be defined once and used consistently; currently it varies between the model equations and the estimation section.
  2. Add a brief discussion of related literature on intra-cluster correlation adjustments in ranking indices or performance metrics.
  3. The simulation tables would benefit from reporting coverage probabilities or bias metrics in addition to ranking accuracy to strengthen the comparison.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§3] §3 (Betoidal distribution): the novel distribution is introduced to encode rounding and left-truncation, yet no comparison to a standard truncated-beta likelihood, no residual diagnostics, and no sensitivity checks to the rounding mechanism are provided; this directly affects the reliability of the ML parameter estimates used for the adjusted ISPD.

    Authors: We agree that additional validation of the Betoidal distribution is warranted to support the reliability of the ML estimates. In the revised manuscript we will add: (i) a direct comparison of the Betoidal likelihood against a standard truncated-beta specification on the same data, (ii) residual diagnostics (e.g., PIT histograms and QQ plots adapted for the coarsened support), and (iii) sensitivity analyses varying the rounding interval and truncation threshold. These results will be reported in an expanded subsection of §3. revision: yes

  2. Referee: [§5] §5 (Empirical application): the adjusted ISPD is constructed from parameters estimated on the identical 2017 and 2022 datasets used to evaluate its 'fairer' property, so the reported improvement is internal to the fitted model rather than benchmarked against an external criterion.

    Authors: We acknowledge that the empirical application evaluates the adjusted index on the same data used for estimation, rendering the reported reduction in polarization an in-sample illustration rather than an external validation. The primary evidence for superiority remains the simulation study in §6. In revision we will explicitly clarify this distinction in §5, emphasize that the empirical section demonstrates practical consequences of the adjustment on real rankings, and add a brief discussion of the simulation results as the out-of-sample benchmark. revision: partial

  3. Referee: [§6] §6 (Simulation study): superiority is demonstrated under data generated from the proposed model; no robustness results are shown when the size-dependent correlation function is misspecified or when intra-department homogeneity follows a different structure, which is central to the claim that the adjustment removes polarization bias.

    Authors: The simulation study is intentionally conducted under the data-generating process implied by our model to isolate the performance of the adjustment when assumptions hold. We recognize that robustness to misspecification is important for the broader claim. In the revised version we will include additional simulation scenarios in which the intra-department correlation is generated from alternative structures (constant correlation, random-effects, and non-monotonic size dependence) and report the resulting ranking performance of the adjusted ISPD under these misspecifications. revision: yes

Circularity Check

1 steps flagged

Adjusted ISPD fairness defined via ML fit on the same data used for ranking evaluation

specific steps
  1. fitted input called prediction [Abstract]
    "The proposed model, estimated via Maximum Likelihood, allows to build a fairer ranking procedure via the definition of a properly adjusted version of the ISPD. [...] a simulation study shows that the adjusted index significantly overcomes not only the original ISPD, but also other more data-demanding competing proposals."

    The adjusted ISPD is defined using parameters estimated on the identical data to which it is applied; its claimed superiority (both empirical and in simulation) therefore reduces to the consequences of that fit rather than an external criterion for fairness.

full rationale

The paper's core claim is that the ML-estimated model yields a 'fairer' adjusted ISPD. This adjustment is constructed directly from parameters fitted to the 2017/2022 data, and the simulation study demonstrating superiority is performed under the model's own assumptions (intra-department correlation as function of size, Betoidal for coarsening). While the simulation offers a partial external check, the 'fairer' property lacks an independent benchmark outside the fitted model, producing moderate circularity of the fitted-input-called-prediction type. No self-citation load-bearing or self-definitional reduction of the full derivation chain is present.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that department-level homogeneity is a smooth function of size and that the Betoidal distribution correctly captures the data coarsening process; both are introduced without external validation.

free parameters (1)
  • size-dependent correlation parameters
    Estimated by maximum likelihood from the 2017 and 2022 datasets; their values determine the adjusted ISPD.
axioms (1)
  • domain assumption Standardized researcher scores within a department exhibit correlation that depends only on department size
    Invoked to justify the model specification in the abstract.
invented entities (1)
  • Betoidal distribution no independent evidence
    purpose: To model left-truncated and rounded performance scores
    Newly defined probability distribution introduced to fit the coarsened public data.

pith-pipeline@v0.9.0 · 5498 in / 1320 out tokens · 27885 ms · 2026-05-13T18:28:53.620004+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages

  1. [1]

    Elenco dei Dipartimenti ammessi alla selezione dei Dipartimenti di eccellenza 2018 – 2022 (link)

    MUR. Elenco dei Dipartimenti ammessi alla selezione dei Dipartimenti di eccellenza 2018 – 2022 (link). Technical report,

  2. [2]

    Elenco dei Dipartimenti ammessi alla selezione dei Dipartimenti di eccellenza 2023 – 2027 (link)

    MUR. Elenco dei Dipartimenti ammessi alla selezione dei Dipartimenti di eccellenza 2023 – 2027 (link). Technical report,

  3. [3]

    Poggi and C

    G. Poggi and C. A. Nappi. Il voto standardizzato per l’esercizio VQR 2004-2010.RIV: rassegna italiana di valutazione, 59(2):34–58,

  4. [4]

    Letting ρ= 1 N(N−1) X i∈N X i′∈N \i ρii′ be the average pairwise intra-departmental correlation, we haveV( ˜Z) = 1 + (N−1)ρ

    The variance of the scaled average ˜Z= P i∈N Zi/ √ Nis given by V( ˜Z) = 1 N X i∈N V(Z i) + 1 N X i∈N X i′∈N \i Cov(Zi, Zi′) = 1 N  N+ X i∈N X i′∈N \i ρii′   , 16 whereρ ii′ = Corr(Zi, Zi′). Letting ρ= 1 N(N−1) X i∈N X i′∈N \i ρii′ be the average pairwise intra-departmental correlation, we haveV( ˜Z) = 1 + (N−1)ρ. Hence, unlessρdecreases sufficiently ...

  5. [5]

    Equivalently, E(X

    = Z +∞ −∞ {Φ(z)}2 1 σ ϕ z σ dz = Z +∞ −∞ {Φ(kσ)} 2ϕ(k)dk= 1 π atan p 1 + 2σ2 , where the last equality follows from identities reported in Owen (1980). Equivalently, E(X