A Generalized Additive Partial-Mastery Cognitive Diagnosis Model

Camilo C\'ardenas-Hurtado; Irini Moustaki; Sze Ming Lee; Yunxiao Chen

arxiv: 2511.20191 · v2 · submitted 2025-11-25 · 📊 stat.ME · stat.CO

A Generalized Additive Partial-Mastery Cognitive Diagnosis Model

Camilo C\'ardenas-Hurtado , Sze Ming Lee , Yunxiao Chen , Irini Moustaki This is my paper

Pith reviewed 2026-05-17 04:57 UTC · model grok-4.3

classification 📊 stat.ME stat.CO

keywords cognitive diagnosispartial masterynonparametric modelingitem response functionslatent variable modelseducational assessmenthealthcare measurementgeneralized additive models

0 comments

The pith

The generalized additive partial-mastery cognitive diagnosis model relaxes strong parametric assumptions on item response functions by representing them as mixtures of nonparametric monotone functions of attributes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes the GaPM-CDM to improve upon existing partial-mastery cognitive diagnosis models. Standard PM-CDMs assume specific parametric forms for how attributes influence item responses, which can lead to misspecification. The new model instead uses mixtures of nonparametric monotone functions for each item response function to allow greater flexibility. Estimation combines marginal maximum likelihood with sieve approximations to handle the nonparametric components. The approach works for both confirmatory analyses with known structures and exploratory ones, as shown in simulations and applications to education and healthcare data.

Core claim

The authors develop the GaPM-CDM in which each item response function is expressed as a mixture of nonparametric monotone functions of the underlying attributes. This formulation relaxes the parametric assumptions inherited from traditional CDMs while preserving parsimony and interpretability. Parameter estimation proceeds through the marginal maximum likelihood estimator augmented by a sieve approximation for the nonparametric functions. The resulting model supports both confirmatory use when the attribute-to-item mapping is known and exploratory use when it is not.

What carries the argument

Mixture of nonparametric monotone functions for each item response function, approximated via sieves in marginal maximum likelihood estimation.

If this is right

The model provides better fits to real data than standard PM-CDMs by reducing misspecification risk.
It yields more refined measurements of continuous partial mastery levels for each attribute.
The method applies equally in confirmatory settings with prior knowledge and exploratory settings without it.
Simulation studies confirm stable estimation and improved performance over parametric alternatives.
Real-data applications in educational testing and healthcare demonstrate practical utility.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the nonparametric components capture complex attribute interactions effectively, the model could generalize to other latent variable models in psychometrics.
Extensions might include incorporating additional covariates into the mixture components for richer diagnostic profiles.
Comparing the GaPM-CDM to fully nonparametric alternatives could reveal whether the mixture structure offers advantages in interpretability or computation.
Applying the model to longitudinal data might allow tracking changes in partial mastery over time with flexible response functions.

Load-bearing premise

That the sieve approximation of mixtures of nonparametric monotone functions will accurately recover the true item response functions without introducing substantial bias or estimation instability.

What would settle it

Generate data from a standard parametric PM-CDM and check if the GaPM-CDM recovers the known item parameters and attribute distributions with negligible bias; large discrepancies would indicate the approximation fails to match even parametric cases.

Figures

Figures reproduced from arXiv: 2511.20191 by Camilo C\'ardenas-Hurtado, Irini Moustaki, Sze Ming Lee, Yunxiao Chen.

**Figure 2.** Figure 2: Estimated weights (ˆαjk), estimated monotone functions (ˆgjk(Uk)), and estimated IRF surface (ˆπj (U)) under the GaPM-CDM, for selected items 1, 3, and 12 in the ECPE dataset. The estimated functions ˆg3,M and ˆg3,L (Figure 2b) indicate that item 3 has high guessing and moderate slipping probabilities, consistent with benchmark model estimates. Unlike these models, the GaPM-CDM captures the non-linear rel… view at source ↗

**Figure 3.** Figure 3: Estimated EAP factor scores. ECPE data. function, defined as the participation in and satisfaction with usual social roles in daily life situations and activities (Castel et al., 2008, Hahn et al., 2010). This module contains J = 56 items. The attributes that these items measure and their relationship with the items are unknown. We analyze a sub-sample from wave 1 of PROMIS, consisting of N = 737 non-clini… view at source ↗

**Figure 4.** Figure 4: Cross-validation test-data marginal log-likelihood for the GaPM [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗

**Figure 5.** Figure 5: Matrix of estimated weights (transposed) for the [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗

read the original abstract

Cognitive diagnosis models (CDMs) are restricted latent class models widely used to measure attributes of interest in diagnostic assessments across education, psychology, biomedical sciences, and related fields. Partial-mastery CDMs (PM-CDMs) are an important extension of CDMs. They model individuals' status for each attribute as continuous to measure partial mastery levels, thereby relaxing the restrictive discrete-attribute assumption of classical CDMs. As a result, PM-CDMs often yield better fits to real-world data and more refined measurements of the substantive attributes of interest. However, these models inherit strong parametric assumptions from traditional CDMs about item response functions and thus still face a significant risk of model misspecification. This paper proposes a generalized additive PM-CDM (GaPM-CDM) that substantially relaxes the parametric assumptions of PM-CDMs. This proposal leverages model parsimony and interpretability by modeling each item response function as a mixture of nonparametric monotone functions of attributes. A method for estimating GaPM-CDM is developed that combines the marginal maximum likelihood estimator with a sieve approximation of the nonparametric functions. The new model is applicable in both confirmatory and exploratory settings, depending on whether prior knowledge of the relationship between observed variables and attributes is available. The proposed method is evaluated and compared with PM-CDMs through extensive simulation studies and further applied to two measurement problems from educational testing and healthcare research, respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds a nonparametric mixture-of-monotone-functions layer to partial-mastery CDMs and backs it with simulations plus two real-data examples, but the sieve-based marginal MML step leaves open questions about extra bias and stability.

read the letter

The main takeaway is that this work relaxes the parametric item-response assumptions in partial-mastery CDMs by representing each IRF as a mixture of nonparametric monotone functions of the attributes. That is the concrete novelty relative to the PM-CDM papers cited in the abstract, and it keeps the additive structure so the model stays interpretable in both confirmatory and exploratory modes.

Referee Report

3 major / 2 minor

Summary. The paper proposes the generalized additive partial-mastery cognitive diagnosis model (GaPM-CDM), which relaxes the parametric assumptions of standard PM-CDMs by representing each item response function as a mixture of nonparametric monotone functions of the attributes. Estimation proceeds via marginal maximum likelihood combined with sieve approximation of the nonparametric components. The model supports both confirmatory (known Q-matrix) and exploratory settings and is assessed through simulation studies plus applications to educational testing and healthcare data.

Significance. If the sieve-based marginal MML procedure recovers the true IRFs with negligible extra bias relative to parametric baselines and remains stable under integration over continuous attributes, the GaPM-CDM would supply a useful nonparametric yet interpretable extension for diagnostic modeling. The combination of mixture monotonicity with sieve estimation preserves some parsimony while allowing data-driven flexibility; the reported simulation coverage and two real-data examples constitute concrete empirical grounding.

major comments (3)

[§3] §3 (Estimation): No convergence rate or bias bound is supplied for the sieve approximation of the mixture of monotone functions inside the marginal likelihood integral. Because the integral is taken over a continuous attribute distribution whose dimension equals the number of attributes, sieve error can compound during numerical quadrature; this directly affects the central claim that the estimator reliably recovers true IRFs without substantial new bias.
[§4.2] §4.2 (Exploratory simulations): The reported recovery metrics for the Q-matrix and mixture weights do not include separate bias or variance decompositions for the nonparametric components versus the parametric PM-CDM baseline. Without these, it is impossible to verify whether the added flexibility degrades finite-sample performance faster than the parametric model when the number of attributes grows.
[§5] §5 (Real-data applications): The model-fit comparisons (e.g., AIC/BIC or cross-validated prediction) are presented only in aggregate; no diagnostic checks are shown for monotonicity violations or sensitivity of the estimated mixture weights to the choice of sieve basis dimension.

minor comments (2)

[§2] Notation for the mixture weights and the sieve basis functions is introduced without a consolidated table; a single display would improve readability.
The abstract states 'extensive simulation studies' but the main text would benefit from an explicit statement of the number of Monte Carlo replications and the range of attribute dimensions examined.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and have revised the paper accordingly where feasible to strengthen the presentation of the estimation procedure, simulation results, and real-data analyses.

read point-by-point responses

Referee: §3 (Estimation): No convergence rate or bias bound is supplied for the sieve approximation of the mixture of monotone functions inside the marginal likelihood integral. Because the integral is taken over a continuous attribute distribution whose dimension equals the number of attributes, sieve error can compound during numerical quadrature; this directly affects the central claim that the estimator reliably recovers true IRFs without substantial new bias.

Authors: We agree that explicit convergence rates for the sieve estimator within the marginal likelihood would provide additional theoretical support. Our development relies on standard sieve theory for monotone functions (e.g., results on isotonic estimation and additive models), but we did not derive new bounds accounting for the quadrature over the continuous attribute space. In the revised manuscript we will add a brief discussion in Section 3 referencing existing rates for sieve approximations of monotone functions and noting the potential for compounding error under numerical integration. Deriving sharp, model-specific rates is beyond the current scope but represents a valuable direction for follow-up work. We have therefore made a partial revision by expanding the methodological discussion without introducing new theoretical proofs. revision: partial
Referee: §4.2 (Exploratory simulations): The reported recovery metrics for the Q-matrix and mixture weights do not include separate bias or variance decompositions for the nonparametric components versus the parametric PM-CDM baseline. Without these, it is impossible to verify whether the added flexibility degrades finite-sample performance faster than the parametric model when the number of attributes grows.

Authors: We concur that decomposing mean squared error into bias and variance components would clarify the trade-off introduced by the nonparametric flexibility. In the revised version we will augment Section 4.2 (and the associated supplementary tables) with separate bias and variance summaries for the estimated item response functions, mixture weights, and Q-matrix recovery, stratified by the number of attributes. These additions will allow direct comparison of finite-sample behavior between GaPM-CDM and the parametric baseline. revision: yes
Referee: §5 (Real-data applications): The model-fit comparisons (e.g., AIC/BIC or cross-validated prediction) are presented only in aggregate; no diagnostic checks are shown for monotonicity violations or sensitivity of the estimated mixture weights to the choice of sieve basis dimension.

Authors: We appreciate the suggestion to strengthen the empirical validation. In the revised manuscript we will include, in Section 5 and the supplementary materials, diagnostic plots verifying monotonicity of the estimated item response functions for both applications, together with sensitivity analyses that vary the sieve basis dimension and report the resulting stability of the mixture weights and model-fit statistics. These checks will be presented alongside the existing aggregate comparisons. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in GaPM-CDM proposal

full rationale

The paper proposes the GaPM-CDM as a direct extension of PM-CDMs by replacing parametric item response functions with mixtures of nonparametric monotone functions of attributes, then develops a marginal maximum likelihood estimator using sieve approximation. This is evaluated via simulations and real-data applications in confirmatory and exploratory modes. No derivation step reduces a claimed result or prediction to its own inputs by construction, no fitted parameter is relabeled as an independent prediction, and no load-bearing premise rests solely on self-citation. The central modeling and estimation choices are presented as independent methodological contributions rather than tautological reparameterizations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central modeling choice relies on representing item response functions via mixtures of nonparametric monotone functions; no free parameters or invented entities are explicitly listed in the abstract, and the estimation method assumes the sieve approximation converges appropriately.

axioms (1)

domain assumption Item response functions can be adequately represented as mixtures of nonparametric monotone functions of the attributes
This is the key relaxation of parametric assumptions stated in the abstract.

pith-pipeline@v0.9.0 · 5558 in / 1234 out tokens · 47894 ms · 2026-05-17T04:57:40.696091+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

modeling each item response function as a mixture of nonparametric monotone functions of attributes... sieve approximation of the nonparametric functions
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

sieve marginal maximum likelihood estimator... piecewise linear functions to approximate the monotone IRFs

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

[1]

F., Fort, G., and Moulines, E

Atchad´ e, Y. F., Fort, G., and Moulines, E. (2017). On Perturbed Proxim al Gradient Algorithms. Journal of Machine Learning Research , 18(10):1–33

work page 2017
[2]

and Teboulle, M

Beck, A. and Teboulle, M. (2003). Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175. 9 Figure A1: Matrix of estimated factor loadings (transposed) for the K = 5 dimensional aPM-CDM on the PROMIS dataset

work page 2003
[3]

and Hsu, D

Dasgupta, S. and Hsu, D. (2007). On-Line Estimation with the Multivariate Gau ssian Distribution. In Bshouty, N. H. and Gentile, C., editors, Learning Theory: Proceed- ings of the 20th Annual Conference on Learning Theory (COLT 2007 ), pages 278–292

work page 2007
[4]

De Bortoli, V., Durmus, A., Pereyra, M., and Vidal, A

Berlin, DE: Springer-Verlag. De Bortoli, V., Durmus, A., Pereyra, M., and Vidal, A. F. (2021). Eﬃcient sto chastic optimisation by unadjusted Langevin Monte Carlo: Application to maximum mar ginal likelihood and empirical Bayesian estimation. Statistics and Computing , 31(29):1–18

work page 2021
[5]

and Moulines, E

Durmus, A. and Moulines, E. (2017). Nonasymptotic convergence analysis for th e unad- justed Langevin algorithm. The Annals of Applied Probability , 27(3):1551–1587

work page 2017
[6]

and Warmuth, M

Kivinen, J. and Warmuth, M. K. (1997). Exponentiated Gradient versus G radient Descent for Linear Predictors. Information and Computation , 132(1):1–63

work page 1997
[7]

and Moulines, E

Oliviero-Durmus, A. and Moulines, E. (2024). On geometric convergence for the Metropolis-adjusted Langevin algorithm under simple conditions. Biometrika, 111(1):273–289

work page 2024
[8]

Polyak, B. T. and Juditsky, A. B. (1992). Acceleration of stochastic approxi mation by averaging. SIAM Journal of Control and Optimization , 30(4):838–855

work page 1992
[9]

and Monro, S

Robbins, H. and Monro, S. (1951). A Stochastic Approximation Method. The Annals of Mathematical Statistics , 22(3):400–407. 10

work page 1951
[10]

Roberts, G. O. and Rosenthal, J. S. (1998). Optimal Scaling of Discrete App roximations to Langevin Diﬀusions. Journal of the Royal Statistical Society: Series B (Methodological ), 60(1):255–268

work page 1998
[11]

Roberts, G. O. and Tweedie, R. L. (1996). Exponential convergence of Langevin distri- butions and their discrete approximations. Bernoulli, 2(4):341–363

work page 1996
[12]

Ruppert, D. (1988). Eﬃcient Estimations from a Slowly Convergent Robbi ns-Monro Pro- cess. Technical Report 781, School of Operations Research and Industrial Engineering, College of Engineering, Cornell University

work page 1988
[13]

and Chen, Y

Zhang, S. and Chen, Y. (2022). Computation for Latent Variable Model Estimation: A Uniﬁed Stochastic Proximal Framework. Psychometrika, 87(4):1473–1502. 11

work page 2022

[1] [1]

F., Fort, G., and Moulines, E

Atchad´ e, Y. F., Fort, G., and Moulines, E. (2017). On Perturbed Proxim al Gradient Algorithms. Journal of Machine Learning Research , 18(10):1–33

work page 2017

[2] [2]

and Teboulle, M

Beck, A. and Teboulle, M. (2003). Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175. 9 Figure A1: Matrix of estimated factor loadings (transposed) for the K = 5 dimensional aPM-CDM on the PROMIS dataset

work page 2003

[3] [3]

and Hsu, D

Dasgupta, S. and Hsu, D. (2007). On-Line Estimation with the Multivariate Gau ssian Distribution. In Bshouty, N. H. and Gentile, C., editors, Learning Theory: Proceed- ings of the 20th Annual Conference on Learning Theory (COLT 2007 ), pages 278–292

work page 2007

[4] [4]

De Bortoli, V., Durmus, A., Pereyra, M., and Vidal, A

Berlin, DE: Springer-Verlag. De Bortoli, V., Durmus, A., Pereyra, M., and Vidal, A. F. (2021). Eﬃcient sto chastic optimisation by unadjusted Langevin Monte Carlo: Application to maximum mar ginal likelihood and empirical Bayesian estimation. Statistics and Computing , 31(29):1–18

work page 2021

[5] [5]

and Moulines, E

Durmus, A. and Moulines, E. (2017). Nonasymptotic convergence analysis for th e unad- justed Langevin algorithm. The Annals of Applied Probability , 27(3):1551–1587

work page 2017

[6] [6]

and Warmuth, M

Kivinen, J. and Warmuth, M. K. (1997). Exponentiated Gradient versus G radient Descent for Linear Predictors. Information and Computation , 132(1):1–63

work page 1997

[7] [7]

and Moulines, E

Oliviero-Durmus, A. and Moulines, E. (2024). On geometric convergence for the Metropolis-adjusted Langevin algorithm under simple conditions. Biometrika, 111(1):273–289

work page 2024

[8] [8]

Polyak, B. T. and Juditsky, A. B. (1992). Acceleration of stochastic approxi mation by averaging. SIAM Journal of Control and Optimization , 30(4):838–855

work page 1992

[9] [9]

and Monro, S

Robbins, H. and Monro, S. (1951). A Stochastic Approximation Method. The Annals of Mathematical Statistics , 22(3):400–407. 10

work page 1951

[10] [10]

Roberts, G. O. and Rosenthal, J. S. (1998). Optimal Scaling of Discrete App roximations to Langevin Diﬀusions. Journal of the Royal Statistical Society: Series B (Methodological ), 60(1):255–268

work page 1998

[11] [11]

Roberts, G. O. and Tweedie, R. L. (1996). Exponential convergence of Langevin distri- butions and their discrete approximations. Bernoulli, 2(4):341–363

work page 1996

[12] [12]

Ruppert, D. (1988). Eﬃcient Estimations from a Slowly Convergent Robbi ns-Monro Pro- cess. Technical Report 781, School of Operations Research and Industrial Engineering, College of Engineering, Cornell University

work page 1988

[13] [13]

and Chen, Y

Zhang, S. and Chen, Y. (2022). Computation for Latent Variable Model Estimation: A Uniﬁed Stochastic Proximal Framework. Psychometrika, 87(4):1473–1502. 11

work page 2022