An aggregate learning approach for interpretable semi-supervised population prediction and disaggregation using ancillary data

Fr\'ed\'eric Docquier; Guillaume Derval; Pierre Schaus

arxiv: 1907.00270 · v1 · pith:IHPDXGIDnew · submitted 2019-06-29 · 💻 cs.LG · stat.ML

An aggregate learning approach for interpretable semi-supervised population prediction and disaggregation using ancillary data

Guillaume Derval , Fr\'ed\'eric Docquier , Pierre Schaus This is my paper

Pith reviewed 2026-05-25 12:37 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords aggregate learningpopulation disaggregationsemi-supervised learningcensus dataancillary datainterpretable modelsmachine learning

0 comments

The pith

A simple interpretable model using aggregate learning matches state-of-the-art accuracy when disaggregating coarse census data to fine population maps with ancillary features.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reframes the task of turning regional census counts into pixel-level population estimates as an aggregate learning problem, where supervision is available only at the level of whole regions rather than individual pixels. It shows that a basic, interpretable model trained under this constraint, using ancillary data such as land use or infrastructure layers, performs on par with or better than specialized methods on several accuracy metrics. This matters because detailed population maps are required to evaluate the local effects of climate events, disasters, infrastructure projects, and development policies. The approach deliberately avoids extra spatial assumptions or post-processing steps that many existing disaggregation techniques rely on.

Core claim

By casting population disaggregation as aggregate learning, in which the model must produce pixel values whose sums match known regional census totals, a straightforward model supplied with ancillary data can recover fine-scale distributions at accuracy levels comparable to or exceeding current specialized techniques while remaining fully interpretable.

What carries the argument

The aggregate learning formulation, where labels are known only for aggregates of points (regions) and the model learns to assign values to individual points (pixels) such that their sums recover the aggregates.

If this is right

High-resolution population maps become available from existing coarse census releases without requiring new fine-scale surveys.
Local impacts of climate shocks, natural disasters, and infrastructure investments can be quantified at scales finer than the original census units.
Development policies can be evaluated using population distributions that are derived directly from the same ancillary data used for prediction.
Interpretable models reduce the barrier for adoption by practitioners who need to understand and justify the resulting maps.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same aggregate-learning framing could be applied to other spatial disaggregation tasks such as economic activity or environmental exposure where only coarse statistics exist.
Because the model stays simple, it offers a baseline against which more complex deep-learning disaggregation methods can be compared for gains that justify added opacity.
Performance on metrics where the simple model already leads suggests that ancillary data quality, rather than model sophistication, may be the dominant remaining bottleneck.

Load-bearing premise

The aggregate learning setup with only regional labels and ancillary data is enough to recover accurate fine-scale population values without additional spatial modeling assumptions or post-hoc adjustments.

What would settle it

If a held-out high-resolution census dataset collected at the pixel or small-area level shows systematic mismatches with the model's predicted counts in those same areas, the claim that the simple aggregate approach suffices would be refuted.

Figures

Figures reproduced from arXiv: 1907.00270 by Fr\'ed\'eric Docquier, Guillaume Derval, Pierre Schaus.

**Figure 1.** Figure 1: Cambodian population maps obtained by PCD-LinExp-PPE (a specialization of PCD using the PCD-LinExp model type, introduced in a later section) and RF. The bottom maps are a zoom of a specific, moderately populated region of Cambodia. The red box highlights a region where RF produces seemingly artificial results: it creates a circle around a (non-displayed) hospital, and saturates near the road network. PCD-… view at source ↗

**Figure 2.** Figure 2: Notation example [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: provides an example of two sets of units, with the same population, but with different predictions. Both of them have eRMSE = 700. As can be seen, the example on the left panel makes very important errors on units U3 (error = 496) and U2 (272), where actual population counts are relatively small. However, these errors do not contribute a lot to the RMSE and are instead absorbed by the error of unit U1 (100… view at source ↗

**Figure 4.** Figure 4: Scatter plot of the error metrics for each model and for each fold. While it is difficult to compare results for the redistributed errors, PCD-LinExp-PPE and PCDLinExp-RMSE give the best result on unadjusted errors. The methods PCD-LinExpPPER, PCD-LinExp-RMSER, RFR and RF-AdjR are redistributed counterparts of the original methods. $ $ [PITH… view at source ↗

**Figure 5.** Figure 5: Error for each unit for all models. Overall, all results presented in this section indicate that PCD-LinExp-PPE and PCD-LinExp-RMSE generate better results than RF and RF-Adj, at least for predicting the population (i.e. on unadjusted errors). The comparison for [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

read the original abstract

Census data provide detailed information about population characteristics at a coarse resolution. Nevertheless, fine-grained, high-resolution mappings of population counts are increasingly needed to characterize population dynamics and to assess the consequences of climate shocks, natural disasters, investments in infrastructure, development policies, etc. Dissagregating these census is a complex machine learning, and multiple solutions have been proposed in past research. We propose in this paper to view the problem in the context of the aggregate learning paradigm, where the output value for all training points is not known, but where it is only known for aggregates of the points (i.e. in this context, for regions of pixels where a census is available). We demonstrate with a very simple and interpretable model that this method is on par, and even outperforms on some metrics, the state-of-the-art, despite its simplicity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They recast population disaggregation as aggregate learning and show a simple model can match SOTA on some metrics, but the identifiability of the fine-scale maps is not obviously guaranteed by the setup.

read the letter

The core move is to treat census counts as aggregate labels over regions of pixels and train a model that predicts at the pixel level while only penalizing errors on the known regional totals. That framing is reasonable and lets them use ancillary data without needing pixel-level ground truth. The paper does well by keeping the model deliberately simple and interpretable; if the experiments really show it holding its own against heavier methods, that is useful information for anyone who needs quick, explainable disaggregation rather than the absolute best numbers.

Referee Report

2 major / 1 minor

Summary. The paper proposes framing the disaggregation of coarse-resolution census data into fine-grained population maps as an aggregate learning problem, where labels are known only at the regional level and ancillary data are used in a semi-supervised setting. It claims that a very simple and interpretable model achieves performance on par with, and in some cases better than, state-of-the-art methods.

Significance. If the empirical claims hold under proper validation, the work would offer a straightforward, interpretable alternative for high-resolution population mapping tasks relevant to climate shocks, disasters, and policy assessment. The emphasis on simplicity within the aggregate-learning paradigm could reduce dependence on complex spatial models if identifiability and performance are rigorously demonstrated.

major comments (2)

[Abstract] Abstract: the assertion that the method 'is on par, and even outperforms on some metrics, the state-of-the-art' supplies no experimental details, datasets, metrics, baselines, or error analysis, so the central empirical claim cannot be evaluated from the manuscript text.
[Abstract] Abstract: the aggregate-learning formulation (labels known only at region level) is presented without any stated spatial regularizer, smoothness prior, or identifiability constraint. Multiple fine-scale allocations can produce identical regional aggregates, so the loss alone does not guarantee recovery of accurate pixel-level distributions.

minor comments (1)

[Abstract] Abstract: 'Dissagregating' is a typographical error for 'Disaggregating'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's comments. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that the method 'is on par, and even outperforms on some metrics, the state-of-the-art' supplies no experimental details, datasets, metrics, baselines, or error analysis, so the central empirical claim cannot be evaluated from the manuscript text.

Authors: The abstract is intentionally concise. The full manuscript provides the requested details in Section 4 (datasets, metrics such as MAE/RMSE, baselines, and error analysis). We will revise the abstract to briefly reference the experimental setting and key results. revision: yes
Referee: [Abstract] Abstract: the aggregate-learning formulation (labels known only at region level) is presented without any stated spatial regularizer, smoothness prior, or identifiability constraint. Multiple fine-scale allocations can produce identical regional aggregates, so the loss alone does not guarantee recovery of accurate pixel-level distributions.

Authors: We agree that the aggregate loss alone is underdetermined. The approach relies on ancillary data features to drive the mapping in the simple model; experiments demonstrate competitive performance. We will add a discussion paragraph on identifiability and the role of ancillary data. revision: partial

Circularity Check

0 steps flagged

No circularity; derivation self-contained with no self-referential reductions

full rationale

The provided abstract and description contain no equations, parameter-fitting procedures, or derivation steps that reduce to their own inputs by construction. The central claim is an empirical demonstration that a simple aggregate-learning model performs on par with SOTA methods; this is an external performance comparison rather than a mathematical identity or self-citation chain. No self-definitional loops, fitted-input predictions, or uniqueness theorems imported from prior author work are present. The paper is therefore scored at the default non-circularity level.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.0 · 5681 in / 943 out tokens · 39699 ms · 2026-05-25T12:37:28.987733+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 2 internal anchors

[1]

Machine Learning 45(1), 5–32 (Oct 2001)

Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (Oct 2001). https://doi.org/10.1023/A:1010933404324

work page doi:10.1023/a:1010933404324 2001
[2]

Remote Sensing of Environment 108(4), 451 – 466 (2007)

Briggs, D.J., Gulliver, J., Fecht, D., Vienneau, D.M.: Dasymetric mod- elling of small-area population distribution using land cover and light emis- sions data. Remote Sensing of Environment 108(4), 451 – 466 (2007). https://doi.org/https://doi.org/10.1016/j.rse.2006.11.020

work page doi:10.1016/j.rse.2006.11.020 2007
[3]

Center for International Earth Science Information Network - CIESIN - Columbia University: Gridded population of the world, version 4 (gpwv4): Population density, revision 10 (20180711 2017), https://doi.org/10.7927/H4DZ068D

work page doi:10.7927/h4dz068d 2017
[4]

census grids 2010 (summary ﬁle 1) (20180719 2017),https://doi

Center for International Earth Science Information Network - CIESIN - Columbia University: U.s. census grids 2010 (summary ﬁle 1) (20180719 2017),https://doi. org/10.7927/H40Z716C

work page doi:10.7927/h40z716c 2010
[5]

Applied Geography 53, 417 – 426 (2014)

Dmowska, A., Stepinski, T.F.: High resolution dasymetric model of u.s demographics with application to spatial distribution of racial diversity. Applied Geography 53, 417 – 426 (2014). https://doi.org/https://doi.org/10.1016/j.apgeog.2014.07.003

work page doi:10.1016/j.apgeog.2014.07.003 2014
[6]

In: Proceedings of the 7th Annual Symposium on Computing for De- velopment

Doupe, P., Bruzelius, E., Faghmous, J., Ruchman, S.G.: Equitable develop- ment through deep learning: The case of sub-national population density esti- mation. In: Proceedings of the 7th Annual Symposium on Computing for De- velopment. pp. 6:1–6:10. ACM DEV ’16, ACM, New York, NY, USA (2016). https://doi.org/10.1145/3001913.3001921

work page doi:10.1145/3001913.3001921 2016
[7]

Cartography and Geographic Information Science28(2), 125–138 (2001)

Eicher, C.L., Brewer, C.A.: Dasymetric mapping and areal interpolation: Imple- mentation and evaluation. Cartography and Geographic Information Science28(2), 125–138 (2001)

work page 2001
[8]

In: Geographic Information Systems, Spatial Modelling and Policy Evaluation, pp

Flowerdew, R., Green, M.: Developments in areal interpolation methods and gis. In: Geographic Information Systems, Spatial Modelling and Policy Evaluation, pp. 73–84. Springer (1993) 16 G. Derval et al

work page 1993
[9]

Population and Environment 31(6), 460–473 (Jul 2010)

Gallego, F.J.: A population density grid of the european union. Population and Environment 31(6), 460–473 (Jul 2010). https://doi.org/10.1007/s11111-010-0108- y

work page doi:10.1007/s11111-010-0108- 2010
[10]

Environment and planning A 25(3), 383–397 (1993)

Goodchild, M.F., Anselin, L., Deichmann, U.: A framework for the areal interpo- lation of socioeconomic data. Environment and planning A 25(3), 383–397 (1993)

work page 1993
[11]

Nature 405(6789), 947 (2000)

Hahnloser, R.H., Sarpeshkar, R., Mahowald, M.A., Douglas, R.J., Seung, H.S.: Digital selection and analogue ampliﬁcation coexist in a cortex-inspired silicon circuit. Nature 405(6789), 947 (2000)

work page 2000
[12]

Adam: A Method for Stochastic Optimization

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014), http://arxiv.org/abs/1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2014
[13]

Proceedings of the IEEE 86(11), 2278–2324 (1998)

LeCun, Y., Bottou, L., Bengio, Y., Haﬀner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)

work page 1998
[14]

The Professional Geographer 55(1), 31–42 (2003)

Mennis, J.: Generating surface models of population using dasymetric mapping. The Professional Geographer 55(1), 31–42 (2003)

work page 2003
[15]

The International Yearbook of Cartography 24(115), e121 (1984)

Monmonier, M.S., Schnell, G.A.: Land use and land cover data and the mapping of population density. The International Yearbook of Cartography 24(115), e121 (1984)

work page 1984
[16]

In: Seventh IEEE International Conference on Data Mining (ICDM 2007)

Musicant, D.R., Christensen, J.M., Olson, J.F.: Supervised learning by training on aggregate outputs. In: Seventh IEEE International Conference on Data Mining (ICDM 2007). pp. 252–261. IEEE (2007)

work page 2007
[17]

Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic diﬀerentiation in pytorch (2017)

work page 2017
[18]

In: Proceedings of the 1st ACM SIGSPATIAL Workshop on Geospatial Humanities

Robinson, C., Hohman, F., Dilkina, B.: A deep learning approach for population estimation from satellite imagery. In: Proceedings of the 1st ACM SIGSPATIAL Workshop on Geospatial Humanities. pp. 47–54. ACM (2017)

work page 2017
[19]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[20]

PLOS ONE 10(2), 1–22 (02 2015)

Stevens, F.R., Gaughan, A.E., Linard, C., Tatem, A.J.: Disaggregat- ing census data for population mapping using random forests with remotely-sensed and ancillary data. PLOS ONE 10(2), 1–22 (02 2015). https://doi.org/10.1371/journal.pone.0107042, https://doi.org/10.1371/ journal.pone.0107042

work page doi:10.1371/journal.pone.0107042 2015
[21]

Ecological Modelling 189(1-2), 72–88 (2005)

Tian, Y., Yue, T., Zhu, L., Clinton, N.: Modeling population density using land cover data. Ecological Modelling 189(1-2), 72–88 (2005)

work page 2005
[22]

Jour- nal of the American Statistical Association 74(367), 519–530 (1979)

Tobler, W.R.: Smooth pycnophylactic interpolation for geographical regions. Jour- nal of the American Statistical Association 74(367), 519–530 (1979)

work page 1979
[23]

UN Economic and Social Council: Resolution adopted by the economic and so- cial council on 10 june 2015 (2020 world population and housing census pro- gramme) (August 2015), http://www.un.org/ga/search/view_doc.asp?symbol= E/RES/2015/10

work page 2015
[24]

Geographical Review 26(1), 103–110 (1936)

Wright, J.K.: A method of mapping densities of population: With cape cod as an example. Geographical Review 26(1), 103–110 (1936)

work page 1936

[1] [1]

Machine Learning 45(1), 5–32 (Oct 2001)

Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (Oct 2001). https://doi.org/10.1023/A:1010933404324

work page doi:10.1023/a:1010933404324 2001

[2] [2]

Remote Sensing of Environment 108(4), 451 – 466 (2007)

Briggs, D.J., Gulliver, J., Fecht, D., Vienneau, D.M.: Dasymetric mod- elling of small-area population distribution using land cover and light emis- sions data. Remote Sensing of Environment 108(4), 451 – 466 (2007). https://doi.org/https://doi.org/10.1016/j.rse.2006.11.020

work page doi:10.1016/j.rse.2006.11.020 2007

[3] [3]

Center for International Earth Science Information Network - CIESIN - Columbia University: Gridded population of the world, version 4 (gpwv4): Population density, revision 10 (20180711 2017), https://doi.org/10.7927/H4DZ068D

work page doi:10.7927/h4dz068d 2017

[4] [4]

census grids 2010 (summary ﬁle 1) (20180719 2017),https://doi

Center for International Earth Science Information Network - CIESIN - Columbia University: U.s. census grids 2010 (summary ﬁle 1) (20180719 2017),https://doi. org/10.7927/H40Z716C

work page doi:10.7927/h40z716c 2010

[5] [5]

Applied Geography 53, 417 – 426 (2014)

Dmowska, A., Stepinski, T.F.: High resolution dasymetric model of u.s demographics with application to spatial distribution of racial diversity. Applied Geography 53, 417 – 426 (2014). https://doi.org/https://doi.org/10.1016/j.apgeog.2014.07.003

work page doi:10.1016/j.apgeog.2014.07.003 2014

[6] [6]

In: Proceedings of the 7th Annual Symposium on Computing for De- velopment

Doupe, P., Bruzelius, E., Faghmous, J., Ruchman, S.G.: Equitable develop- ment through deep learning: The case of sub-national population density esti- mation. In: Proceedings of the 7th Annual Symposium on Computing for De- velopment. pp. 6:1–6:10. ACM DEV ’16, ACM, New York, NY, USA (2016). https://doi.org/10.1145/3001913.3001921

work page doi:10.1145/3001913.3001921 2016

[7] [7]

Cartography and Geographic Information Science28(2), 125–138 (2001)

Eicher, C.L., Brewer, C.A.: Dasymetric mapping and areal interpolation: Imple- mentation and evaluation. Cartography and Geographic Information Science28(2), 125–138 (2001)

work page 2001

[8] [8]

In: Geographic Information Systems, Spatial Modelling and Policy Evaluation, pp

Flowerdew, R., Green, M.: Developments in areal interpolation methods and gis. In: Geographic Information Systems, Spatial Modelling and Policy Evaluation, pp. 73–84. Springer (1993) 16 G. Derval et al

work page 1993

[9] [9]

Population and Environment 31(6), 460–473 (Jul 2010)

Gallego, F.J.: A population density grid of the european union. Population and Environment 31(6), 460–473 (Jul 2010). https://doi.org/10.1007/s11111-010-0108- y

work page doi:10.1007/s11111-010-0108- 2010

[10] [10]

Environment and planning A 25(3), 383–397 (1993)

Goodchild, M.F., Anselin, L., Deichmann, U.: A framework for the areal interpo- lation of socioeconomic data. Environment and planning A 25(3), 383–397 (1993)

work page 1993

[11] [11]

Nature 405(6789), 947 (2000)

Hahnloser, R.H., Sarpeshkar, R., Mahowald, M.A., Douglas, R.J., Seung, H.S.: Digital selection and analogue ampliﬁcation coexist in a cortex-inspired silicon circuit. Nature 405(6789), 947 (2000)

work page 2000

[12] [12]

Adam: A Method for Stochastic Optimization

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014), http://arxiv.org/abs/1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2014

[13] [13]

Proceedings of the IEEE 86(11), 2278–2324 (1998)

LeCun, Y., Bottou, L., Bengio, Y., Haﬀner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)

work page 1998

[14] [14]

The Professional Geographer 55(1), 31–42 (2003)

Mennis, J.: Generating surface models of population using dasymetric mapping. The Professional Geographer 55(1), 31–42 (2003)

work page 2003

[15] [15]

The International Yearbook of Cartography 24(115), e121 (1984)

Monmonier, M.S., Schnell, G.A.: Land use and land cover data and the mapping of population density. The International Yearbook of Cartography 24(115), e121 (1984)

work page 1984

[16] [16]

In: Seventh IEEE International Conference on Data Mining (ICDM 2007)

Musicant, D.R., Christensen, J.M., Olson, J.F.: Supervised learning by training on aggregate outputs. In: Seventh IEEE International Conference on Data Mining (ICDM 2007). pp. 252–261. IEEE (2007)

work page 2007

[17] [17]

Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic diﬀerentiation in pytorch (2017)

work page 2017

[18] [18]

In: Proceedings of the 1st ACM SIGSPATIAL Workshop on Geospatial Humanities

Robinson, C., Hohman, F., Dilkina, B.: A deep learning approach for population estimation from satellite imagery. In: Proceedings of the 1st ACM SIGSPATIAL Workshop on Geospatial Humanities. pp. 47–54. ACM (2017)

work page 2017

[19] [19]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[20] [20]

PLOS ONE 10(2), 1–22 (02 2015)

Stevens, F.R., Gaughan, A.E., Linard, C., Tatem, A.J.: Disaggregat- ing census data for population mapping using random forests with remotely-sensed and ancillary data. PLOS ONE 10(2), 1–22 (02 2015). https://doi.org/10.1371/journal.pone.0107042, https://doi.org/10.1371/ journal.pone.0107042

work page doi:10.1371/journal.pone.0107042 2015

[21] [21]

Ecological Modelling 189(1-2), 72–88 (2005)

Tian, Y., Yue, T., Zhu, L., Clinton, N.: Modeling population density using land cover data. Ecological Modelling 189(1-2), 72–88 (2005)

work page 2005

[22] [22]

Jour- nal of the American Statistical Association 74(367), 519–530 (1979)

Tobler, W.R.: Smooth pycnophylactic interpolation for geographical regions. Jour- nal of the American Statistical Association 74(367), 519–530 (1979)

work page 1979

[23] [23]

UN Economic and Social Council: Resolution adopted by the economic and so- cial council on 10 june 2015 (2020 world population and housing census pro- gramme) (August 2015), http://www.un.org/ga/search/view_doc.asp?symbol= E/RES/2015/10

work page 2015

[24] [24]

Geographical Review 26(1), 103–110 (1936)

Wright, J.K.: A method of mapping densities of population: With cape cod as an example. Geographical Review 26(1), 103–110 (1936)

work page 1936