Regression Analysis of Dependent Binary Data for Estimating Disease Etiology from Case-Control Studies

Irena Chen; Zhenke Wu

arxiv: 1906.08436 · v1 · pith:W2IFH3F6new · submitted 2019-06-20 · 📊 stat.ME · stat.AP

Regression Analysis of Dependent Binary Data for Estimating Disease Etiology from Case-Control Studies

Zhenke Wu , Irena Chen This is my paper

Pith reviewed 2026-05-25 19:48 UTC · model grok-4.3

classification 📊 stat.ME stat.AP

keywords disease etiologycase-control studieslatent class modelpopulation etiologic fractionregression analysismeasurement specificitychildhood pneumonia

0 comments

The pith

Control data on diagnostic measures enables regression analysis of how covariates affect disease etiology fractions in case-control studies

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extends nested partially-latent class models to a regression framework that incorporates explanatory variables when estimating population etiologic fractions from case-control data. A separate regression model is fitted to the controls to recover the distribution of their diagnostic measures given covariates, which supplies the needed information on measurement specificities and conditional dependencies. This information is transferred to assign cause-specific probabilities to each case, after which Markov chain Monte Carlo yields posterior inference on the covariate-dependent etiologic fraction functions and the overall fractions. Simulations demonstrate reduced bias and more valid inference for the overall fractions relative to the non-regression version of the model. The approach is illustrated on childhood pneumonia data, where etiology is shown to vary with season, age, severity, and HIV status.

Core claim

By estimating the distribution of diagnostic measures given covariates from controls alone and using that estimate to inform the measurement model for cases, the extended framework correctly assigns latent cause probabilities to individual cases and thereby produces regression functions for the population etiologic fractions while properly accounting for imperfect sensitivity, specificity, and dependence among multiple binary measures.

What carries the argument

The extended nested partially-latent class model that uses a separate regression fit on controls to supply the measurement specificities and dependence structure transferred to the cases

If this is right

Estimation of overall population etiologic fractions exhibits less bias than the version of the model that omits covariates
Inference on the overall fractions is more valid once covariate information is included
The method can reveal how etiology depends on measured factors such as season, age, disease severity, and HIV status

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same separation of control and case models could be applied to other infectious-disease studies that collect multiple imperfect diagnostic tests on both cases and controls
If the control regression is misspecified, the resulting case assignments would be biased, so sensitivity checks that vary the control model form would be a natural next step
The framework could be extended to time-to-event or longitudinal covariates if the control regression is generalized accordingly

Load-bearing premise

The separate regression model fitted to the controls' diagnostic measures given covariates is correctly specified and supplies accurate information on specificities and conditional dependence structures that can be transferred to the cases.

What would settle it

A validation study in which a subset of cases has known true causes from a gold-standard test; if the regression model's estimated cause probabilities for those cases deviate systematically from the known causes while the control model fits well, the transfer of information would be shown to fail.

Figures

Figures reproduced from arXiv: 1906.08436 by Irena Chen, Zhenke Wu.

**Figure 2.** Figure 2: The regression analyses produce less biased posterior mean estimates and more [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗

**Figure 4.** Figure 4: Prior densities for logit(α ν ik), the fraction to be broken for subclass k from the stick currently left, when αik equals: 1) the intercept µ ∗ k0 (black, solid line) or 2) the first B-spline coefficient β (1),ν kj (red, broken line). The former concentrates near 1 because µ ∗ k0 has a scaled-t distributed prior that puts substantial mass at the right tail; much less so for the latter. 38 [PITH_FULL_IMAG… view at source ↗

**Figure 5.** Figure 5: By propagating the prior that encourages few subclasses, the algorithm correctly infers two subclasses from the simulated data in Simulation I, Section 4 of Main Paper. Estimated case (top) and control (bottom) subclass weight curves for seven subclasses over one continuous covariate νbk(t) (central blue dashed lines enclosed by the 95% credible regions; the red curves are posterior samples) compared again… view at source ↗

**Figure 6.** Figure 6: Posterior distributions of the stratum-specific (Row 1 and 2) and the overall (Bottom Row) PEFs based on a simulation with a two-level discrete covariate and L = J = 6 causes. The vertical gray lines indicate the 2.5% and 97.5% posterior quantiles, respectively; The truths are indicated by vertical blue dashed lines. Row 1-2) PEFs by stratum (level = 1,2) and cause (A-F); Bottom) π ∗ ` : overall population… view at source ↗

**Figure 7.** Figure 7: NPLCM analyses with or without regression perform similarly in terms of percent relative bias (top) and empirical coverage rates (bottom) over R = 100 replications in simulations where the case and control subclass weights do not vary by covariates. Each panel corresponds to one of 16 combinations of true parameter values and sample sizes. See [PITH_FULL_IMAGE:figures/full_fig_p041_7.png] view at source ↗

**Figure 3.** Figure 3: Estimated seasonal PEF πb`(date, age,severity,HIV) for two most prevalent ageseverity-HIV strata: younger (a) or older (b) than one, with severe pneumonia, HIV negative; Here the results are obtained from a model assuming seven single-pathogen causes (HINF, PNEU, ADENO, HMPV.A.B, PARA.1, RHINO, RSV) and an “Not Specified” cause. In an age-severity-HIV stratum and for each cause `: Row 2) shows the tempora… view at source ↗

**Figure 8.** Figure 8: Panel plot with BrS, SS and Etiology Pies obtained from an npLCM analysis omitting covariates (K = 5). For each of the 7 pathogens, a summary of the BrS and SS data analyzed in Section 5 of Main Paper is shown in the left two columns, along with some of the intermediate model results; and the prior and posterior distributions for the PEFs on the right (rows ordered by posterior means). Left) The observed B… view at source ↗

**Figure 9.** Figure 9: Individual etiology fraction estimates for RSV (left) and NoS (right) differ by age and season among HIV negative and severe pneumonia cases for whom the seven pathogens were all tested negative in the nasopharyngeal specimens. 44 [PITH_FULL_IMAGE:figures/full_fig_p044_9.png] view at source ↗

read the original abstract

In large-scale disease etiology studies, epidemiologists often need to use multiple binary measures of unobserved causes of disease that are not perfectly sensitive or specific to estimate cause-specific case fractions, referred to as "population etiologic fractions" (PEFs). Despite recent methodological advances, the scientific need of incorporating control data to estimate the effect of explanatory variables upon the PEFs, however, remains unmet. In this paper, we build on and extend nested partially-latent class model (npLCMs, Wu et al., 2017) to a general framework for etiology regression analysis in case-control studies. Data from controls provide requisite information about measurement specificities and covariations, which is used to correctly assign cause-specific probabilities for each case given her measurements. We estimate the distribution of the controls' diagnostic measures given the covariates via a separate regression model and a priori encourage simpler conditional dependence structures. We use Markov chain Monte Carlo for posterior inference of the PEF functions, cases' latent classes and the overall PEFs of policy interest. We illustrate the regression analysis with simulations and show less biased estimation and more valid inference of the overall PEFs than an npLCM analysis omitting covariates. A regression analysis of data from a childhood pneumonia study site reveals the dependence of pneumonia etiology upon season, age, disease severity and HIV status.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Extends npLCMs so PEFs can depend on covariates by fitting measurement structure on controls then transferring it to cases, but the transfer step itself gets no direct check.

read the letter

The paper adds a regression layer to the nested partially latent class model so that population etiologic fractions can vary with covariates such as season or HIV status. Controls are used to estimate the regression for the binary diagnostic measures, which supplies the specificities and dependence structure that then gets plugged into the case model for assigning latent classes and PEF functions. MCMC is used for the posterior. Simulations are reported to give lower bias for the overall PEFs than the covariate-free version, and the method is applied to a childhood pneumonia dataset looking at age, severity, and other factors. That is the concrete advance over the 2017 npLCM reference. The setup is coherent on its own terms and the citation pattern is appropriate. The weakest link is the assumption that the measurement model fitted to controls carries over unchanged to cases. Nothing in the abstract or described results tests whether disease status itself changes how the diagnostics behave or alters the dependence pattern. The simulations only confirm performance when the model is correctly specified, which does not address the transfer question. If that assumption fails, the cause-specific probabilities for cases will be off even if the control regression looks fine. This is a specialized methods paper aimed at statisticians already working on etiology attribution in case-control studies. Readers who need to adjust PEFs for covariates will find the framework useful once they see the implementation details. It is coherent enough and addresses a real gap, so it deserves a serious referee. I would send it for review and ask the referees to focus on diagnostics or sensitivity checks for the control-to-case transfer.

Referee Report

2 major / 1 minor

Summary. The manuscript extends nested partially-latent class models (npLCMs) to a regression framework for estimating covariate-dependent population etiologic fractions (PEFs) from case-control studies with multiple imperfect binary diagnostic measures. A separate regression is fit to control data to recover specificities and conditional dependence structure; this is transferred to the case model to assign latent class probabilities, with MCMC used for posterior inference on PEF functions and overall PEFs. Simulations are reported to yield less biased PEF estimates than covariate-omitting npLCM, and the approach is illustrated on childhood pneumonia data showing dependence on season, age, severity, and HIV status.

Significance. If the transferability assumption holds, the work supplies a needed tool for covariate-adjusted etiology estimation in large-scale studies, directly addressing the unmet need stated in the abstract. It builds on Wu et al. (2017) with a practical MCMC implementation and an explicit preference for simpler dependence structures. The framework could improve policy-relevant PEF inference when covariates are available.

major comments (2)

[Abstract] Abstract: the claim that 'simulations show less biased estimation and more valid inference of the overall PEFs' is presented without any quantitative metrics (bias, coverage, or simulation design details). This leaves the central performance claim unsupported in the summary and requires the results section to supply the missing numbers and settings.
[Model description] Model construction (control-to-case transfer step): the measurement model (specificities and conditional dependence) estimated from controls is transferred unchanged to cases under the assumption that disease status does not alter these properties. No analytic derivation, sensitivity analysis, or diagnostic check is described to test this invariance; because the claim of 'correctly assign cause-specific probabilities' rests on this transfer, the assumption is load-bearing and needs explicit robustness assessment.

minor comments (1)

[Abstract] Abstract: 'npLCMs' is used before the parenthetical expansion; spell out on first use for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below and indicate planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'simulations show less biased estimation and more valid inference of the overall PEFs' is presented without any quantitative metrics (bias, coverage, or simulation design details). This leaves the central performance claim unsupported in the summary and requires the results section to supply the missing numbers and settings.

Authors: We agree that the abstract would be strengthened by including quantitative metrics. In the revised manuscript we will add concise statements of key simulation results (e.g., average bias reduction and empirical coverage rates across scenarios) together with a brief description of the simulation design (sample sizes, number of replicates, covariate configurations). These numbers already appear in the results section; we will simply summarize them in the abstract as requested. revision: yes
Referee: [Model description] Model construction (control-to-case transfer step): the measurement model (specificities and conditional dependence) estimated from controls is transferred unchanged to cases under the assumption that disease status does not alter these properties. No analytic derivation, sensitivity analysis, or diagnostic check is described to test this invariance; because the claim of 'correctly assign cause-specific probabilities' rests on this transfer, the assumption is load-bearing and needs explicit robustness assessment.

Authors: The referee correctly notes that the invariance assumption is central. While the assumption is standard in the case-control etiology literature (measurement properties are viewed as test characteristics independent of disease status), we acknowledge that the manuscript would benefit from explicit robustness checks. In the revision we will add a dedicated sensitivity-analysis subsection that perturbs the transferred specificities and dependence parameters within plausible ranges and reports the resulting changes in PEF estimates. We will also expand the model-description text to state the assumption more explicitly and cite supporting literature. revision: yes

Circularity Check

0 steps flagged

No circularity; control regression supplies independent measurement parameters transferred to case model.

full rationale

The paper fits a regression model exclusively to control diagnostic data to estimate specificities and conditional dependence structures, then transfers those estimates to the case model for latent class and PEF inference. This separation means the PEF functions are not defined in terms of themselves or recovered by construction from the same fitted quantities. The citation to Wu et al. (2017) supplies the base npLCM structure but does not create a self-citation load-bearing loop for the regression extension. No equations reduce predictions to inputs, no ansatz is smuggled, and no uniqueness theorem is invoked from overlapping authors. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; ledger entries are therefore limited to standard Bayesian assumptions implied by the described MCMC procedure.

axioms (1)

standard math MCMC sampling yields valid posterior inference for the PEF functions and latent classes
Standard assumption invoked when the abstract states that MCMC is used for posterior inference.

pith-pipeline@v0.9.0 · 5766 in / 1249 out tokens · 30002 ms · 2026-05-25T19:48:38.668226+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We extend npLCM to perform regression analysis... multinomial logistic regression model πiℓ=πℓ(Xi)... stick-breaking parameterization... MCMC for posterior inference of the PEF functions
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Data from controls provide requisite information about measurement specificities and covariations... P0(m;w)=[M=m|W=w,I=0]

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 1 internal anchor

[1]

L., Zeger, S

Bandeen-Roche, K., Miglioretti, D. L., Zeger, S. L., and Rathouz, P. J. (1997). Latent variable regression for multiple discrete outcomes. Journal of the American Statistical Association , 92(440):1375--1386

work page 1997
[2]

and Forcina, A

Bartolucci, F. and Forcina, A. (2006). A class of latent marginal models for capture--recapture data with continuous covariates. Journal of the American Statistical Association , 101(474):786--794

work page 2006
[3]

J., Christensen, R., and Johnson, W

Bedrick, E. J., Christensen, R., and Johnson, W. (1996). A new perspective on priors for generalized linear models. Journal of the American Statistical Association , 91(436):1450--1460

work page 1996
[4]

and Gelman, A

Brooks, S. and Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics , 7(4):434--455

work page 1998
[5]

and Louis, T

Carlin, B. and Louis, T. (2009). Bayesian methods for data analysis , volume 78. Chapman & Hall/CRC

work page 2009
[6]

P., and Schafer, J

Chung, H., Flaherty, B. P., and Schafer, J. L. (2006). Latent class logistic regression: application to marijuana use and attitudes among high school seniors. Journal of the Royal Statistical Society: Series A (Statistics in Society) , 169(4):723--743

work page 2006
[7]

C., Brooks, W

Crawley, J., Prosperi, C., Baggett, H. C., Brooks, W. A., Deloria Knoll, M., Hammitt, L. L., Howie, S. R., Kotloff, K. L., Levine, O. S., Madhi, S. A., et al. (2017). Standardization of clinical assessment and sample collection across all perch study sites. Clinical infectious diseases , 64(suppl\_3):S228--S237

work page 2017
[8]

L., Feikin, D

Deloria Knoll, M., Fu, W., Shi, Q., Prosperi, C., Wu, Z., Hammitt, L. L., Feikin, D. R., Baggett, H. C., Howie, S. R., Scott, J. A. G., et al. (2017). Bayesian estimation of pneumonia etiology: epidemiologic considerations and applications to the pneumonia etiology research for child health study. Clinical infectious diseases , 64(suppl\_3):S213--S227

work page 2017
[9]

and Xing, C

Dunson, D. and Xing, C. (2009). Nonparametric bayes modeling of multivariate categorical data. Journal of the American Statistical Association , 104(487):1042--1051

work page 2009
[10]

A., Fienberg, S

Erosheva, E. A., Fienberg, S. E., and Joutard, C. (2007). Describing disability through individual-level mixture models for multivariate binary data. The annals of applied statistics , 1(2):346

work page 2007
[11]

Feikin, D., Scott, J., and Gessner, B. (2014). Use of vaccines as probes to define disease burden. The Lancet , 383(9930):1762--1770

work page 2014
[12]

and Smith, A

Gelfand, A. and Smith, A. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American statistical association , pages 398--409

work page 1990
[13]

G., and Su, Y.-S

Gelman, A., Jakulin, A., Pittau, M. G., and Su, Y.-S. (2008). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics , pages 1360--1383

work page 2008
[14]

Gelman, A., Meng, X.-L., and Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica , 6(4):733--760

work page 1996
[15]

and Zhou, G

Geweke, J. and Zhou, G. (1996). Measuring the pricing error of the arbitrage pricing theory. The review of financial studies , 9(2):557--587

work page 1996
[16]

Goodman, L. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika , 61(2):215--231

work page 1974
[17]

A., Schwartz, J., and Suh, H

Gryparis, A., Coull, B. A., Schwartz, J., and Suh, H. H. (2007). Semiparametric latent variable regression models for spatiotemporal modelling of mobile source particles in the greater boston area. Journal of the Royal Statistical Society: Series C (Applied Statistics) , 56(2):183--209

work page 2007
[18]

and Xu, G

Gu, Y. and Xu, G. (2019a). Learning attribute patterns in high-dimensional structured latent attribute models. Journal of Machine Learning Research , page In press

work page
[19]

and Xu, G

Gu, Y. and Xu, G. (2019b). Partial identifiability of restricted latent class models. Annals of Statistics , page In press

work page
[20]

Gustafson, P. (2015). Bayesian Inference for Partially Identified Models: Exploring the Limits of Limited Data , volume 140. CRC Press

work page 2015
[21]

Gustafson, P., Lefebvre, G., et al. (2008). Bayesian multinomial regression with class-specific predictor selection. The Annals of Applied Statistics , 2(4):1478--1502

work page 2008
[22]

L., Feikin, D

Hammitt, L. L., Feikin, D. R., Scott, J. A. G., Zeger, S. L., Murdoch, D. R., O’brien, K. L., and Deloria Knoll, M. (2017). Addressing the analytic challenges of cross-sectional pediatric pneumonia etiology data. Clinical infectious diseases , 64(suppl\_3):S197--S204

work page 2017
[23]

and Tibshirani, R

Hastie, T. and Tibshirani, R. (1986). Generalized additive models. Statistical Science , 1(3):297--318

work page 1986
[24]

and Bandeen-Roche, K

Huang, G.-H. and Bandeen-Roche, K. (2004). Building an identifiable latent class model with covariate effects on underlying and measured variables. Psychometrika , 69(1):5--32

work page 2004
[25]

Jones, G., Johnson, W., Hanson, T., and Christensen, R. (2010). Identifiability of models for multiple diagnostic testing in the absence of a gold standard. Biometrics , 66(3):855--863

work page 2010
[26]

L., Nataro, J

Kotloff, K. L., Nataro, J. P., Blackwelder, W. C., Nasrin, D., Farag, T. H., Panchalingam, S., Wu, Y., Sow, S. O., Sur, D., Breiman, R. F., et al. (2013). Burden and aetiology of diarrhoeal disease in infants and young children in developing countries (the global enteric multicenter study, gems): a prospective, case-control study. The Lancet , 382(9888):209--222

work page 2013
[27]

and Brezger, A

Lang, S. and Brezger, A. (2004). Bayesian p-splines. Journal of computational and graphical statistics , 13(1):183--212

work page 2004
[28]

Lazarsfeld, P. F. (1950). The logical and mathematical foundations of latent structure analysis , volume IV, chapter The American Soldier: Studies in Social Psychology in World War II, pages 362--412. Princeton, NJ: Princeton University Press

work page 1950
[29]

Linero, A. R. (2018). Bayesian regression trees for high-dimensional prediction and variable selection. Journal of the American Statistical Association , 113(522):626--636

work page 2018
[30]

Little, R. et al. (2011). Calibrated bayes, for statistics in general, and missing data in particular. Statistical Science , 26(2):162--174

work page 2011
[31]

R., Ju \'a rez, M

Morrissey, E. R., Ju \'a rez, M. A., Denby, K. J., and Burroughs, N. J. (2011). Inferring the time-invariant topology of a nonlinear sparse gene regulatory network using fully bayesian spline autoregression. Biostatistics , 12(4):682--694

work page 2011
[32]

A., Katz, M., Roca, A., Berkley, J

Nair, H., Brooks, W. A., Katz, M., Roca, A., Berkley, J. A., Madhi, S. A., Simmerman, J. M., Gordon, A., Sato, M., Howie, S., et al. (2011). Global burden of respiratory infections due to seasonal influenza in young children: a systematic review and meta-analysis. The Lancet , 378(9807):1917--1930

work page 2011
[33]

C., and Baladandayuthapani, V

Ni, Y., Stingo, F. C., and Baladandayuthapani, V. (2015). Bayesian nonlinear model selection for gene regulatory networks. Biometrics

work page 2015
[34]

J., Rivero-Calle, I., Rodr \' guez-Tenreiro, C., Sly, P., Ramilo, O., Mej \' as, A., Baraldi, E., Papadopoulos, N

Obando-Pacheco, P., Justicia-Grande, A. J., Rivero-Calle, I., Rodr \' guez-Tenreiro, C., Sly, P., Ramilo, O., Mej \' as, A., Baraldi, E., Papadopoulos, N. G., Nair, H., et al. (2018). Respiratory syncytial virus seasonality: a global overview. The Journal of infectious diseases , 217(9):1356--1364

work page 2018
[35]

Aetiology of severe hospitalised pneumonia in hiv-uninfected children from africa and asia: the pneumonia aetiology research for child health (perch) case-control study

PERCH Study Group (2019). Aetiology of severe hospitalised pneumonia in hiv-uninfected children from africa and asia: the pneumonia aetiology research for child health (perch) case-control study. Lancet

work page 2019
[36]

Plummer, M. et al. (2003). Jags: A program for analysis of bayesian graphical models using gibbs sampling. In Proceedings of the 3rd International Workshop on Distributed Statistical Computing , volume 124

work page 2003
[37]

and Dunson, D

Rodriguez, A. and Dunson, D. B. (2011). Nonparametric bayesian models through probit stick-breaking processes. Bayesian analysis (Online) , 6(1)

work page 2011
[38]

K., Schrag, S

Saha, S. K., Schrag, S. J., El Arifeen, S., Mullany, L. C., Islam, M. S., Shang, N., Qazi, S. A., Zaidi, A. K., Bhutta, Z. A., Bose, A., et al. (2018). Causes and incidence of community-acquired serious infections among young children in south asia (anisa): an observational cohort study. The Lancet , 392(10142):145--159

work page 2018
[39]

Scott, J. A. G., Brooks, W. A., Peiris, J. M., Holtzman, D., and Mulhollan, E. K. (2008). Pneumonia research to reduce childhood mortality in the developing world. The Journal of clinical investigation , 118(4):1291

work page 2008
[40]

S., Greenland, S., and Kim, L.-L

Witte, J. S., Greenland, S., and Kim, L.-L. (1998). Software for hierarchical modeling of epidemiologic data. Epidemiology , 9(5):563--566

work page 1998
[41]

Wu, Z., Casciola-Rosen, L., Rosen, A., and Zeger, S. L. (2019). A bayesian approach to restricted latent class models for scientifically-structured clustering of multivariate binary outcomes. arXiv preprint arXiv:1808.08326

work page internal anchor Pith review Pith/arXiv arXiv 2019
[42]

L., Zeger, S

Wu, Z., Deloria-Knoll, M., Hammitt, L. L., Zeger, S. L., and for Child Health Core Team, P. E. R. (2016). Partially latent class models for case--control studies of childhood pneumonia aetiology. Journal of the Royal Statistical Society: Series C (Applied Statistics) , 65(1):97--114

work page 2016
[43]

Wu, Z., Deloria-Knoll, M., and Zeger, S. L. (2017). Nested partially latent class models for dependent binary data; estimating disease etiology. Biostatistics (Oxford, England) , 18:200--213

work page 2017
[44]

and Zhou, M

Zhang, Q. and Zhou, M. (2017). Permuted and augmented stick-breaking bayesian multinomial regression. The Journal of Machine Learning Research , 18(1):7479--7511

work page 2017

[1] [1]

L., Zeger, S

Bandeen-Roche, K., Miglioretti, D. L., Zeger, S. L., and Rathouz, P. J. (1997). Latent variable regression for multiple discrete outcomes. Journal of the American Statistical Association , 92(440):1375--1386

work page 1997

[2] [2]

and Forcina, A

Bartolucci, F. and Forcina, A. (2006). A class of latent marginal models for capture--recapture data with continuous covariates. Journal of the American Statistical Association , 101(474):786--794

work page 2006

[3] [3]

J., Christensen, R., and Johnson, W

Bedrick, E. J., Christensen, R., and Johnson, W. (1996). A new perspective on priors for generalized linear models. Journal of the American Statistical Association , 91(436):1450--1460

work page 1996

[4] [4]

and Gelman, A

Brooks, S. and Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics , 7(4):434--455

work page 1998

[5] [5]

and Louis, T

Carlin, B. and Louis, T. (2009). Bayesian methods for data analysis , volume 78. Chapman & Hall/CRC

work page 2009

[6] [6]

P., and Schafer, J

Chung, H., Flaherty, B. P., and Schafer, J. L. (2006). Latent class logistic regression: application to marijuana use and attitudes among high school seniors. Journal of the Royal Statistical Society: Series A (Statistics in Society) , 169(4):723--743

work page 2006

[7] [7]

C., Brooks, W

Crawley, J., Prosperi, C., Baggett, H. C., Brooks, W. A., Deloria Knoll, M., Hammitt, L. L., Howie, S. R., Kotloff, K. L., Levine, O. S., Madhi, S. A., et al. (2017). Standardization of clinical assessment and sample collection across all perch study sites. Clinical infectious diseases , 64(suppl\_3):S228--S237

work page 2017

[8] [8]

L., Feikin, D

Deloria Knoll, M., Fu, W., Shi, Q., Prosperi, C., Wu, Z., Hammitt, L. L., Feikin, D. R., Baggett, H. C., Howie, S. R., Scott, J. A. G., et al. (2017). Bayesian estimation of pneumonia etiology: epidemiologic considerations and applications to the pneumonia etiology research for child health study. Clinical infectious diseases , 64(suppl\_3):S213--S227

work page 2017

[9] [9]

and Xing, C

Dunson, D. and Xing, C. (2009). Nonparametric bayes modeling of multivariate categorical data. Journal of the American Statistical Association , 104(487):1042--1051

work page 2009

[10] [10]

A., Fienberg, S

Erosheva, E. A., Fienberg, S. E., and Joutard, C. (2007). Describing disability through individual-level mixture models for multivariate binary data. The annals of applied statistics , 1(2):346

work page 2007

[11] [11]

Feikin, D., Scott, J., and Gessner, B. (2014). Use of vaccines as probes to define disease burden. The Lancet , 383(9930):1762--1770

work page 2014

[12] [12]

and Smith, A

Gelfand, A. and Smith, A. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American statistical association , pages 398--409

work page 1990

[13] [13]

G., and Su, Y.-S

Gelman, A., Jakulin, A., Pittau, M. G., and Su, Y.-S. (2008). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics , pages 1360--1383

work page 2008

[14] [14]

Gelman, A., Meng, X.-L., and Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica , 6(4):733--760

work page 1996

[15] [15]

and Zhou, G

Geweke, J. and Zhou, G. (1996). Measuring the pricing error of the arbitrage pricing theory. The review of financial studies , 9(2):557--587

work page 1996

[16] [16]

Goodman, L. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika , 61(2):215--231

work page 1974

[17] [17]

A., Schwartz, J., and Suh, H

Gryparis, A., Coull, B. A., Schwartz, J., and Suh, H. H. (2007). Semiparametric latent variable regression models for spatiotemporal modelling of mobile source particles in the greater boston area. Journal of the Royal Statistical Society: Series C (Applied Statistics) , 56(2):183--209

work page 2007

[18] [18]

and Xu, G

Gu, Y. and Xu, G. (2019a). Learning attribute patterns in high-dimensional structured latent attribute models. Journal of Machine Learning Research , page In press

work page

[19] [19]

and Xu, G

Gu, Y. and Xu, G. (2019b). Partial identifiability of restricted latent class models. Annals of Statistics , page In press

work page

[20] [20]

Gustafson, P. (2015). Bayesian Inference for Partially Identified Models: Exploring the Limits of Limited Data , volume 140. CRC Press

work page 2015

[21] [21]

Gustafson, P., Lefebvre, G., et al. (2008). Bayesian multinomial regression with class-specific predictor selection. The Annals of Applied Statistics , 2(4):1478--1502

work page 2008

[22] [22]

L., Feikin, D

Hammitt, L. L., Feikin, D. R., Scott, J. A. G., Zeger, S. L., Murdoch, D. R., O’brien, K. L., and Deloria Knoll, M. (2017). Addressing the analytic challenges of cross-sectional pediatric pneumonia etiology data. Clinical infectious diseases , 64(suppl\_3):S197--S204

work page 2017

[23] [23]

and Tibshirani, R

Hastie, T. and Tibshirani, R. (1986). Generalized additive models. Statistical Science , 1(3):297--318

work page 1986

[24] [24]

and Bandeen-Roche, K

Huang, G.-H. and Bandeen-Roche, K. (2004). Building an identifiable latent class model with covariate effects on underlying and measured variables. Psychometrika , 69(1):5--32

work page 2004

[25] [25]

Jones, G., Johnson, W., Hanson, T., and Christensen, R. (2010). Identifiability of models for multiple diagnostic testing in the absence of a gold standard. Biometrics , 66(3):855--863

work page 2010

[26] [26]

L., Nataro, J

Kotloff, K. L., Nataro, J. P., Blackwelder, W. C., Nasrin, D., Farag, T. H., Panchalingam, S., Wu, Y., Sow, S. O., Sur, D., Breiman, R. F., et al. (2013). Burden and aetiology of diarrhoeal disease in infants and young children in developing countries (the global enteric multicenter study, gems): a prospective, case-control study. The Lancet , 382(9888):209--222

work page 2013

[27] [27]

and Brezger, A

Lang, S. and Brezger, A. (2004). Bayesian p-splines. Journal of computational and graphical statistics , 13(1):183--212

work page 2004

[28] [28]

Lazarsfeld, P. F. (1950). The logical and mathematical foundations of latent structure analysis , volume IV, chapter The American Soldier: Studies in Social Psychology in World War II, pages 362--412. Princeton, NJ: Princeton University Press

work page 1950

[29] [29]

Linero, A. R. (2018). Bayesian regression trees for high-dimensional prediction and variable selection. Journal of the American Statistical Association , 113(522):626--636

work page 2018

[30] [30]

Little, R. et al. (2011). Calibrated bayes, for statistics in general, and missing data in particular. Statistical Science , 26(2):162--174

work page 2011

[31] [31]

R., Ju \'a rez, M

Morrissey, E. R., Ju \'a rez, M. A., Denby, K. J., and Burroughs, N. J. (2011). Inferring the time-invariant topology of a nonlinear sparse gene regulatory network using fully bayesian spline autoregression. Biostatistics , 12(4):682--694

work page 2011

[32] [32]

A., Katz, M., Roca, A., Berkley, J

Nair, H., Brooks, W. A., Katz, M., Roca, A., Berkley, J. A., Madhi, S. A., Simmerman, J. M., Gordon, A., Sato, M., Howie, S., et al. (2011). Global burden of respiratory infections due to seasonal influenza in young children: a systematic review and meta-analysis. The Lancet , 378(9807):1917--1930

work page 2011

[33] [33]

C., and Baladandayuthapani, V

Ni, Y., Stingo, F. C., and Baladandayuthapani, V. (2015). Bayesian nonlinear model selection for gene regulatory networks. Biometrics

work page 2015

[34] [34]

J., Rivero-Calle, I., Rodr \' guez-Tenreiro, C., Sly, P., Ramilo, O., Mej \' as, A., Baraldi, E., Papadopoulos, N

Obando-Pacheco, P., Justicia-Grande, A. J., Rivero-Calle, I., Rodr \' guez-Tenreiro, C., Sly, P., Ramilo, O., Mej \' as, A., Baraldi, E., Papadopoulos, N. G., Nair, H., et al. (2018). Respiratory syncytial virus seasonality: a global overview. The Journal of infectious diseases , 217(9):1356--1364

work page 2018

[35] [35]

Aetiology of severe hospitalised pneumonia in hiv-uninfected children from africa and asia: the pneumonia aetiology research for child health (perch) case-control study

PERCH Study Group (2019). Aetiology of severe hospitalised pneumonia in hiv-uninfected children from africa and asia: the pneumonia aetiology research for child health (perch) case-control study. Lancet

work page 2019

[36] [36]

Plummer, M. et al. (2003). Jags: A program for analysis of bayesian graphical models using gibbs sampling. In Proceedings of the 3rd International Workshop on Distributed Statistical Computing , volume 124

work page 2003

[37] [37]

and Dunson, D

Rodriguez, A. and Dunson, D. B. (2011). Nonparametric bayesian models through probit stick-breaking processes. Bayesian analysis (Online) , 6(1)

work page 2011

[38] [38]

K., Schrag, S

Saha, S. K., Schrag, S. J., El Arifeen, S., Mullany, L. C., Islam, M. S., Shang, N., Qazi, S. A., Zaidi, A. K., Bhutta, Z. A., Bose, A., et al. (2018). Causes and incidence of community-acquired serious infections among young children in south asia (anisa): an observational cohort study. The Lancet , 392(10142):145--159

work page 2018

[39] [39]

Scott, J. A. G., Brooks, W. A., Peiris, J. M., Holtzman, D., and Mulhollan, E. K. (2008). Pneumonia research to reduce childhood mortality in the developing world. The Journal of clinical investigation , 118(4):1291

work page 2008

[40] [40]

S., Greenland, S., and Kim, L.-L

Witte, J. S., Greenland, S., and Kim, L.-L. (1998). Software for hierarchical modeling of epidemiologic data. Epidemiology , 9(5):563--566

work page 1998

[41] [41]

Wu, Z., Casciola-Rosen, L., Rosen, A., and Zeger, S. L. (2019). A bayesian approach to restricted latent class models for scientifically-structured clustering of multivariate binary outcomes. arXiv preprint arXiv:1808.08326

work page internal anchor Pith review Pith/arXiv arXiv 2019

[42] [42]

L., Zeger, S

Wu, Z., Deloria-Knoll, M., Hammitt, L. L., Zeger, S. L., and for Child Health Core Team, P. E. R. (2016). Partially latent class models for case--control studies of childhood pneumonia aetiology. Journal of the Royal Statistical Society: Series C (Applied Statistics) , 65(1):97--114

work page 2016

[43] [43]

Wu, Z., Deloria-Knoll, M., and Zeger, S. L. (2017). Nested partially latent class models for dependent binary data; estimating disease etiology. Biostatistics (Oxford, England) , 18:200--213

work page 2017

[44] [44]

and Zhou, M

Zhang, Q. and Zhou, M. (2017). Permuted and augmented stick-breaking bayesian multinomial regression. The Journal of Machine Learning Research , 18(1):7479--7511

work page 2017