pith. sign in

arxiv: 2604.10018 · v1 · submitted 2026-04-11 · 📊 stat.ME

Inference from multivariate differential recruitment in respondent-driven sampling data

Pith reviewed 2026-05-10 16:25 UTC · model grok-4.3

classification 📊 stat.ME
keywords respondent-driven samplingdifferential recruitmentmultivariate covariatesMarkov processprevalence estimationhidden populationsbootstrap variancechain-referral sampling
0
0 comments X

The pith

Respondent-driven sampling inference can now adjust for multiple simultaneous covariates in recruitment behavior.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework called Multivariate Differential Recruitment that treats RDS recruitment as a Markov process whose transition probabilities are shaped by any number of observed covariates on nodes or ties. Standard prevalence estimators are then rewritten inside this model, and a modified neighborhood bootstrap supplies variance estimates. Simulations test the approach across varied network sizes, recruitment rates, and covariate types, while a real application to Venezuelan migrants in Chile shows how the adjustments change population estimates. A sympathetic reader would care because RDS is widely used for hidden populations in public health, and ignoring multivariate recruitment preferences has long introduced uncontrolled bias in prevalence figures.

Core claim

We model RDS as a Markov process with transition probabilities that depend on continuous or categorical variables associated with nodes or their ties. We then extend various prevalence estimators to this multivariate framework and implement a slightly modified neighborhood bootstrap for variance estimation.

What carries the argument

Multivariate Differential Recruitment (MDR) as a first-order Markov process whose transition probabilities are fully determined by the observed multivariate covariates.

Load-bearing premise

The recruitment process is adequately captured by a first-order Markov model whose transition probabilities are fully determined by the observed multivariate covariates, without substantial unmeasured network structure or higher-order dependencies.

What would settle it

Generate RDS data from networks that include unmeasured homophily or second-order recruitment rules, apply the MDR estimators, and check whether the resulting prevalence estimates remain unbiased relative to the known true values.

Figures

Figures reproduced from arXiv: 2604.10018 by Danilo Alvares, Isabelle S. Beaudry, Jonathan Acosta, Vanesa Reinoso.

Figure 1
Figure 1. Figure 1: Comparison of recruitment probabilities under the MDR model with two variables (edu and [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Estimation error for each estimator in different scenario configurations. Rows correspond to homophily levels [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: 95% confidence interval coverage by estimator in different scenarios configuration. Rows represents the [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Tree of the RDS sampling process, the non-males nodes are colored with dark gray and the males with light [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
read the original abstract

Respondent-Driven Sampling (RDS) is a chain-referral design used for collecting data from hidden or hard-to-reach populations through their social networks. In RDS, respondents recruit their peers from the population of interest. As such, inference with RDS data commonly relies on estimated sampling probabilities derived from specific recruitment assumptions. Early literature assumes random recruitment, which is often unrealistic because individuals may recruit based on their personal preferences. This behavior is known as Differential Recruitment (DR). Recent works have incorporated univariate categorical DR in the estimation procedures. The main objective of this paper is to introduce Multivariate Differential Recruitment (MDR), a framework that incorporates multiple simultaneous covariates, both categorical and continuous, into the sampling representation. We model RDS as a Markov process with transition probabilities that depend on continuous or categorical variables associated with nodes or their ties. We then extend various prevalence estimators to this multivariate framework and implement a slightly modified neighborhood bootstrap for variance estimation. The proposed methodology is assessed through simulation studies for a range of network and sampling features. It is applied to an RDS study conducted among the adult Venezuelan population living in the Metropolitan Region of Santiago, Chile.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a multivariate differential recruitment (MDR) framework for respondent-driven sampling (RDS) data. It models the recruitment process as a first-order Markov chain with transition probabilities that are functions of multiple covariates (categorical or continuous) associated with nodes or ties. Standard prevalence estimators are extended to this setting, and a modified neighborhood bootstrap is proposed for variance estimation. The method is evaluated in simulation studies covering various network and sampling features and demonstrated on an RDS survey of Venezuelan adults in Santiago, Chile.

Significance. If the first-order Markov assumption holds and the observed covariates sufficiently capture recruitment preferences without substantial residual network effects, this framework offers a meaningful extension beyond univariate categorical differential recruitment methods by accommodating simultaneous multivariate influences. The simulation studies across network features and the real-data application to the Chilean Venezuelan population provide practical validation, while the modified neighborhood bootstrap addresses a key implementation need for variance estimation in the extended model.

major comments (2)
  1. [Simulation studies] Simulation studies section: the reported simulations generate data from the assumed first-order Markov model with covariate-dependent transitions; this setup cannot detect bias arising from unmeasured network structure or higher-order dependencies, which directly undermines the central claim that the extended prevalence estimators remain valid under realistic MDR.
  2. [Methods] Methods, Markov process modeling: the transition probabilities are stated to depend on the observed multivariate covariates, but no diagnostic or sensitivity analysis is provided for residual dependence after conditioning on these covariates; this assumption is load-bearing for the subsequent derivation of adjusted sampling weights and the extensions of RDS-I/RDS-II-type estimators.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'extend various prevalence estimators' should explicitly name the estimators (e.g., RDS-I, RDS-II, or others) being generalized to the MDR setting.
  2. [Methods] Notation: the manuscript should clarify whether the transition probability parameters are estimated jointly with the prevalence parameters or in a two-step procedure, as this affects the bootstrap implementation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the potential of the multivariate differential recruitment framework. We address each major comment below and outline the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: [Simulation studies] Simulation studies section: the reported simulations generate data from the assumed first-order Markov model with covariate-dependent transitions; this setup cannot detect bias arising from unmeasured network structure or higher-order dependencies, which directly undermines the central claim that the extended prevalence estimators remain valid under realistic MDR.

    Authors: We agree that the simulations evaluate estimator performance under the first-order Markov data-generating process with covariate-dependent transitions. This verifies the derivations when the modeling assumptions hold, which is the primary scope of the proposed MDR framework. We acknowledge that the current design does not probe robustness to higher-order dependencies or unmeasured network structure. In the revised manuscript we will expand the simulation section with additional experiments that generate recruitment chains from networks exhibiting residual dependence or higher-order Markov structure not captured by the observed covariates. We will also add explicit discussion clarifying that the validity of the extended prevalence estimators is conditional on the first-order MDR assumption and note the need for future robustness checks under more complex network processes. revision: yes

  2. Referee: [Methods] Methods, Markov process modeling: the transition probabilities are stated to depend on the observed multivariate covariates, but no diagnostic or sensitivity analysis is provided for residual dependence after conditioning on these covariates; this assumption is load-bearing for the subsequent derivation of adjusted sampling weights and the extensions of RDS-I/RDS-II-type estimators.

    Authors: The referee is correct that the assumption of no residual dependence after conditioning on the observed covariates is central to the transition model and to the subsequent weight derivations. The current manuscript does not include formal diagnostics or sensitivity analyses for this assumption. We will add a dedicated subsection on model assessment that proposes practical checks for residual dependence (for example, examining autocorrelation patterns in the recruitment chains after covariate adjustment) and outlines sensitivity analyses obtained by successively omitting or adding covariates. These additions will allow users to evaluate the assumption in applied settings and will be accompanied by guidance on interpreting results when the assumption may be only approximately satisfied. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper models RDS recruitment as a first-order Markov process whose transition probabilities are functions of observed multivariate covariates (categorical or continuous), then extends standard prevalence estimators (RDS-I, RDS-II and variants) to this setting and applies a modified neighborhood bootstrap. These steps are presented as direct extensions of existing RDS literature and Markov assumptions rather than reductions of any claimed result to quantities defined solely by the paper's own fitted parameters or self-citations. No equations are shown to be equivalent by construction, no fitted input is relabeled as an independent prediction, and no load-bearing uniqueness theorem or ansatz is imported from the authors' prior work. Simulations and the Chile application serve as external checks rather than the derivation itself. The framework therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on modeling RDS recruitment as a covariate-dependent Markov process and extending existing estimators; this introduces parameters for the transition probabilities that must be estimated from data.

free parameters (1)
  • covariate-dependent transition probability parameters
    Parameters governing how multiple covariates influence recruitment probabilities; these are estimated within the model.
axioms (1)
  • domain assumption RDS data can be represented as a Markov process on the network with transitions depending on node and tie covariates
    Core modeling choice stated in the abstract; standard in RDS but extended here to multivariate case.

pith-pipeline@v0.9.0 · 5502 in / 1211 out tokens · 56998 ms · 2026-05-10T16:25:52.599411+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages

  1. [1]

    Arayasirikul, S., Cai, X., and Wilson, E. C. (2015). A qualitative examination of respondent-driven sampling (RDS): Peer referral challenges among young transwomen in the San Francisco bay area.JMIR Public Health and Surveillance, 1(2):e9

  2. [2]

    D., Morris, M

    Assaf, R. D., Morris, M. D., Straus, E. R., Martinez, P., Philbin, M. M., and Kushel, M. (2025). Illicit substance use and treatment access among adults experiencing homelessness.JAMA, 333(14):1222–1231

  3. [3]

    and Rotondi, M

    Avery, L. and Rotondi, M. (2023). Evaluation of respondent-driven sampling prevalence estimators using real-world reported network degree.Sociological Methodology, 53(2):269–287

  4. [4]

    R., Muleia, R., Nuvunga, S., Boothe, M., and Baltazar, C

    Banze, A. R., Muleia, R., Nuvunga, S., Boothe, M., and Baltazar, C. S. (2024). Trends in HIV prevalence and risk factors among men who have sex with men in Mozambique: Implications for targeted interventions and public health strategies.BMC Public Health, 24(1):1185

  5. [5]

    Barash, V ., Cameron, C., and Heckathorn, D. (2016). Respondent-driven sampling: Testing assumptions.Journal of Official Statistics, 32(1):29–73

  6. [6]

    Beaudry, I. S. and Gile, K. J. (2020). Correcting for differential recruitment in respondent-driven sampling data using ego-network information.Electronic Journal of Statistics, 14(2):2678–2713

  7. [7]

    O., and Pin, P

    Currarini, S., Jackson, M. O., and Pin, P. (2009). An economic model of friendship: Homophily, minorities, and segregation.Econometrica, 77(4):1003–1045

  8. [8]

    and Forsé, M

    Degenne, A. and Forsé, M. (1999).Introducing social networks. Sage Publications, London

  9. [9]

    Fellows, I. E. (2019). Respondent-driven sampling and the homophily configuration graph.Statistics in Medicine, 38(1):131–150

  10. [10]

    Fellows, I. E. (2022). On the robustness of respondent-driven sampling estimators to measurement error.Journal of Survey Statistics and Methodology, 10(2):377–396. Fonseca de Barros, B., Fynn, I., Nocetto, L., Beaudry, I., Luna, J. P., Piñeiro, R., and Rosenblatt Rodríguez, F. (2024). How parties take advantage of immigrant waves. Political incorporation ...

  11. [11]

    and Strauss, D

    Frank, O. and Strauss, D. (1986). Markov graphs.Journal of the American Statistical Association, 81(395):832–842

  12. [12]

    Gile, K., Beaudry, I., Handcock, M., and Ott, M. (2018). Methods for inference from respondent-driven sampling data. Annual Review of Statistics and Its Application, 5:65–93

  13. [13]

    Gile, K. J. and Handcock, M. S. (2010). Respondent-driven sampling: An assessment of current methodology. Sociological Methodology, 40(1):285–327

  14. [14]

    J., Johnston, L

    Gile, K. J., Johnston, L. G., and Salganik, M. J. (2015). Diagnostics for respondent-driven sampling.Journal of the Royal Statistical Society: Series A (Statistics in Society), 178(1):241–269

  15. [15]

    and Salganik, M

    Goel, S. and Salganik, M. J. (2009). Respondent-driven sampling as Markov chain Monte Carlo.Statistics in Medicine, 28(17):2202–2229

  16. [16]

    Hansen, M. H. and Hurwitz, W. N. (1943). On the theory of sampling from finite populations.The Annals of Mathematical Statistics, 14(4):333–362

  17. [17]

    Heckathorn, D. (1997). Respondent-driven sampling: A new approach to the study of hidden populations.Social Problems, 44(2):174–199

  18. [18]

    Heckathorn, D. D. (2002). Respondent-driven sampling II: Deriving valid population estimates from chain-referral samples of hidden populations.Social Problems, 49(1):11–34

  19. [19]

    Heckathorn, D. D. (2007). Extensions of respondent-driven sampling: Analyzing continuous variables and controlling for differential recruitment.Sociological Methodology, 37(1):151–208

  20. [20]

    Heckathorn, D. D. (2011). Snowball versus respondent-driven sampling.Sociological Methodology, 41(1):355–366

  21. [21]

    D., Semaan, S., Broadhead, R

    Heckathorn, D. D., Semaan, S., Broadhead, R. S., and Hughes, J. J. (2002). Extensions of respondent-driven sampling: A new approach to the study of injection drug users aged 18-25.AIDS and Behavior, 6(1):55–67. 20

  22. [22]

    Hunter, D. R. and Handcock, M. S. (2006). Inference in curved exponential family models for networks.Journal of Computational and Graphical Statistics, 15(3):565–583

  23. [23]

    D., Ouedraogo, R., Kakesa, J., and Fetters, T

    Jayaweera, R., Odhoch, L., Nabunje, J., Oduor, C., Zuniga, C., Powell, B., Barasa, W., Aber, F., Nyalwal, B., Wado, Y . D., Ouedraogo, R., Kakesa, J., and Fetters, T. (2025). Incidence and safety of abortion in two humanitarian settings in Uganda and Kenya: A respondent-driven sampling study.eClinicalMedicine, 83:103200

  24. [24]

    G., Malekinejad, M., Kendall, C., Iuppa, I

    Johnston, L. G., Malekinejad, M., Kendall, C., Iuppa, I. M., and Rutherford, G. W. (2008). Implementation challenges to using respondent-driven sampling methodology for HIV biological and behavioral surveillance: Field experiences in international settings.AIDS and Behavior, 12(4):S131–S141

  25. [25]

    Johnston, L. G. and Sabin, K. (2010). Sampling hard-to-reach populations with respondent driven sampling.Method- ological Innovations Online, 5(2):38–48

  26. [26]

    A., Wejnert, C., Hall, D

    Lansky, A., Abdul-Quader, L. A., Wejnert, C., Hall, D. R., Finlayson, D. M., Garfein, L. A., and Sullivan, P. S. (2007). Developing an HIV behavioral surveillance system for injecting drug users: The national HIV behavioral surveillance system.Public Health Reports, 122(Suppl 1):48–55

  27. [27]

    C., Carvalho, T

    Leal, M. C., Carvalho, T. D. G., Santos, Y . R. P., Queiroz, R. S. B., Fonseca, P. A. M., Silva, A. A. M., Szwarcwald, C. L., and Riggirozzi, P. (2025). Determinants of self-rated health among Venezuelan migrant women in Brazil: A cross-sectional study.The Lancet Regional Health - Americas, 45:101077

  28. [28]

    W., Shin, H.-S., Weeks, M., Zelenev, A., Moothi, G., Mosher, H., Heimer, R., Robles, E., Palmer, G., and Obidoa, C

    Li, J., Valente, T. W., Shin, H.-S., Weeks, M., Zelenev, A., Moothi, G., Mosher, H., Heimer, R., Robles, E., Palmer, G., and Obidoa, C. (2018). Overlooked threats to respondent driven sampling estimators: Peer recruitment reality, degree measures, and random selection assumption.AIDS and Behavior, 22(7):2340–2359

  29. [29]

    Liu, H., Li, J., Ha, T., and Li, J. (2012). Assessment of random recruitment assumption in respondent-driven sampling in egocentric network data.Social Networking, 1(2):13–21

  30. [30]

    Lu, X. (2013). Linked ego networks: Improving estimate reliability and validity with respondent-driven sampling. Social Networks, 35:669–685

  31. [31]

    Cambridge University Press, Cambridge, UK

    Lusher, D., Koskinen, J., and Robins, G., editors (2013).Exponential random graph models for social networks: Theory, methods and applications. Cambridge University Press, Cambridge, UK

  32. [32]

    Magnani, R., Sabin, K., Saidel, T., and Heckathorn, D. D. (2005). Review of sampling hard-to-reach and hidden populations for HIV surveillance.AIDS, 19(Suppl. 2):S67–S72

  33. [33]

    McCreesh, N., Frost, S. D. W., Seeley, J., Katongole, J., Tarsh, M. N., Ndunguse, R., Jichi, F., Lunel, N. L., and Maher, D. (2012). Evaluation of respondent-driven sampling.Epidemiology, 23(1):138–147

  34. [34]

    McPherson, M., Smith-Lovin, L., and Cook, J. M. (2001). Birds of a feather: Homophily in social networks.Annual Review of Sociology, 27:415–444. R Core Team (2025).R: A language and environment for statistical computing. R Foundation for Statistical Computing,

  35. [35]

    and Rohe, K

    Roch, S. and Rohe, K. (2018). Generalized least squares can overcome the critical threshold in respondent-driven sampling.Proceedings of the National Academy of Sciences, 115(41):10299–10304

  36. [36]

    E., Nance, R

    Rudolph, A. E., Nance, R. M., Bobashev, G., Brook, D., Akhtar, W., Cook, R., Cooper, H. L., Friedmann, P. D., Frost, S. D. W., Go, V . F., Jenkins, W. D., Korthuis, P. T., Miller, W. C., Pho, M. T., Ruderman, S. A., Seal, D. W., Stopka, T. J., Westergaard, R. P., Young, A. M., Zule, W. A., and Tsui, J. I. (2024). Evaluation of respondent-driven sampling i...

  37. [37]

    Salganik, M. (2006). Variance estimation, design effects, and sample size calculations for respondent-driven sampling. Journal of Urban Health, 83(7):98–112

  38. [38]

    and Heckathorn, D

    Salganik, M. and Heckathorn, D. (2004). Sampling and estimation in hidden populations using respondent-drive sampling.Sociological Methodology, 34(1):193–240

  39. [39]

    Shi, Y ., Cameron, C., and Heckathorn, D. (2019). Model-based and design-based inference: Reducing bias due to differential recruitment in respondent-driven sampling.Sociological Methods & Research, 48(1):3–33

  40. [40]

    Takahashi, Y ., Song, J., and Iida, T. (2025). Transnational political participation of undocumented Mexican immigrants in the US: Respondent-driven sampling with the hard-to-reach population.The Journal of Race, Ethnicity, and Politics, pages 1–26

  41. [41]

    and Gile, K

    Tomas, A. and Gile, K. J. (2011). The effect of differential recruitment, non-response and non-recruitment on estimators for respondent-driven sampling.Electronic Journal of Statistics, 5:899–934. 21

  42. [42]

    Tourangeau, R., Edwards, B., and Johnson, T. (2014). Understanding respondent-driven sampling from a total survey error perspective.Survey Practice, 7(2):1–6

  43. [43]

    M., Merli, M

    Verdery, A. M., Merli, M. G., Moody, J., Smith, J. A., and Fisher, J. C. (2015). Brief report: Respondent-driven sampling estimators under real and theoretical recruitment conditions of female sex workers in China.Epidemiology, 26(5):661–665. V olz, E. and Heckathorn, D. (2008). Probability based estimation theory for respondent driven sampling.Journal of...

  44. [44]

    Wang, P., Wei, C., McFarland, W., and Raymond, H. F. (2024). The development and the assessment of sampling methods for hard-to-reach populations in HIV surveillance.Journal of Urban Health, 101(4):856–866

  45. [45]

    L., Iyer, J., Brooks, D., Hailey-Fair, K., Galai, N., Beyrer, C., Celentano, D., and Arrington-Sanders, R

    Wirtz, A. L., Iyer, J., Brooks, D., Hailey-Fair, K., Galai, N., Beyrer, C., Celentano, D., and Arrington-Sanders, R. (2021). An evaluation of assumptions underlying respondent-driven sampling and the social contexts of sexual and gender minority youth participating in HIV clinical trials in the United States.Journal of the International AIDS Society, 24(5):e25694

  46. [46]

    J., Merli, M

    Yamanis, T. J., Merli, M. G., Neely, W. W., Tian, F. F., Moody, J., Tu, X., and Gao, E. (2013). An empirical analysis of the impact of recruitment patterns on RDS estimates among a socially ordered population of female sex workers in China.Sociological Methods & Research, 42(3):392–425

  47. [47]

    Yauck, M., Moodie, E. E. M., Apelian, H., Fourmigue, A., Grace, D., Hart, T. A., Lambert, G., and Cox, J. (2022). Neighborhood bootstrap for respondent-driven sampling.Journal of Survey Statistics and Methodology, 10(2):419– 438. A Apendix Table 8: Standard deviation (SD) across estimators and scenarios. Scenario =(τ, ϕ MDR) Estimator ˆµII V H ˆµII DR ˆµI...