The Course of News Events: A Comparison of Bottom-Up and Top-Down Approaches for Collecting Text-Based Data about Disasters
Pith reviewed 2026-07-02 13:25 UTC · model grok-4.3
The pith
The choice between querying news databases with an existing disaster list or clustering articles by time and location changes which events enter the data sample.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using a dataset of German news about landslides worldwide, the authors compare top-down querying of news databases with the aid of an existing disaster inventory against bottom-up NLP clustering of news texts based on temporal and spatial features, and they document variations in event coverage that follow from the choice of method.
What carries the argument
The direct side-by-side comparison of top-down inventory-guided search versus bottom-up temporal-spatial text clustering on the same German landslide news corpus.
If this is right
- Different selection methods produce different distributions of covered events.
- Studies of inequality in media attention to disasters become sensitive to the upstream sampling choice.
- Disaster monitoring and inventory enrichment projects inherit whatever coverage gaps the chosen method introduces.
- Researchers must document and justify the selection route before interpreting patterns in the collected news.
Where Pith is reading between the lines
- Future work could test whether the same divergence appears for other hazard types such as floods or earthquakes.
- One practical step would be to run both methods in parallel on new corpora and measure overlap before choosing one.
- The observed differences may also affect how well news-derived data can be merged with satellite or official loss records.
Load-bearing premise
The bottom-up clustering method can be treated as producing a sample that is comparable in coverage and representativeness to the top-down inventory method.
What would settle it
A systematic count showing that one method consistently includes or excludes whole classes of landslide events (for example, small rural slides versus large urban ones) that the other method captures at different rates.
Figures
read the original abstract
News articles are an important source of information on disaster impacts and adaptation. A key methodological challenge in socio-environmental studies is how to select a representative data sample. Two approaches are common: querying news databases top-down with the aid of an existing disaster inventory or using NLP methods to cluster news texts bottom-up based on temporal and spatial features. Using a dataset of German news about landslides worldwide, we compare these approaches and discuss variations in event coverage. Such research design decision can influence the resulting news sample, affecting its use in studies of inequality in media coverage, disaster monitoring and inventory enrichment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper compares top-down (inventory-guided querying of news databases) and bottom-up (NLP-based clustering of news texts using temporal and spatial features) approaches for collecting data on disasters from German news articles about landslides. It finds variations in event coverage and concludes that the choice of approach can influence the news sample, impacting studies of media inequality, disaster monitoring, and inventory enrichment.
Significance. If the empirical comparison is robust, the result would be significant for methodological practice in computational social science and socio-environmental research, as it would demonstrate that data-collection decisions materially affect downstream analyses of coverage patterns. The work usefully flags implications for inequality studies and inventory enrichment.
major comments (2)
- [Abstract] Abstract: the description is high-level only and supplies no implementation details, metrics, statistical tests, or data-exclusion rules, preventing assessment of whether observed sample differences support the central claim.
- [Bottom-up method] Bottom-up method (wherever described): no external validation of clusters against known events (precision/recall, event-matching, or overlap with independent ground truth) is reported. This is load-bearing, because without it the claim that differences reflect genuine coverage variation rather than clustering artifacts cannot be evaluated.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive comments on our manuscript. We address each major comment below and outline the revisions we will make.
read point-by-point responses
-
Referee: [Abstract] Abstract: the description is high-level only and supplies no implementation details, metrics, statistical tests, or data-exclusion rules, preventing assessment of whether observed sample differences support the central claim.
Authors: We agree that the abstract would benefit from greater specificity to allow readers to evaluate the empirical support for our claims. In the revised version, we will expand the abstract to include key implementation details for both the top-down and bottom-up methods, the primary comparison metrics (e.g., event overlap rates), any statistical tests performed, and explicit data-exclusion rules. This change will directly address the concern while maintaining the abstract's brevity. revision: yes
-
Referee: [Bottom-up method] Bottom-up method (wherever described): no external validation of clusters against known events (precision/recall, event-matching, or overlap with independent ground truth) is reported. This is load-bearing, because without it the claim that differences reflect genuine coverage variation rather than clustering artifacts cannot be evaluated.
Authors: We acknowledge that the manuscript does not report formal external validation metrics such as precision/recall against an independent ground truth for the bottom-up clusters. The comparison with the top-down inventory serves as an internal cross-check, but we agree this does not fully substitute for explicit validation. In revision, we will add a dedicated subsection describing any available overlap-based matching with the inventory events, manual inspection procedures used to assess cluster quality, and a limitations discussion on potential clustering artifacts. If additional independent ground truth becomes available, we will incorporate quantitative metrics; otherwise, we will clearly flag the reliance on comparative evidence. revision: partial
Circularity Check
No circularity: empirical methodological comparison with no fitted predictions or self-referential derivations
full rationale
The paper performs an empirical side-by-side comparison of two news-sampling strategies (top-down inventory queries vs. bottom-up NLP clustering on temporal/spatial features) using a German landslide news corpus. No equations, parameter fits, or 'predictions' are defined; the central claim is simply that the two methods produce measurably different samples. No self-citations are invoked to justify uniqueness or to close any derivation loop, and the work does not rename known results or smuggle ansatzes. The analysis is therefore self-contained against external benchmarks and receives the default non-circularity score.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Delforge, Damien and Wathelet, Valentin and Below, Regina and Sofia, Cinzia Lanfredi and Tonnelier, Margo and van Loenhout, Joris A.F. and Speybroeck, Niko , year =. EM-DAT: the Emergency Events Database , volume =. doi:10.1016/j.ijdrr.2025.105509 , journal =
-
[2]
EGUsphere , VOLUME =
Valkenborg, Bram and Dewitte, Olivier and Smets, Benoît , TITLE =. EGUsphere , VOLUME =. 2026 , PAGES =
2026
-
[3]
Taylor, Faith E. and Malamud, Bruce D. and Freeborough, Katy and Demeritt, David , year =. Enriching Great Britain’s National Landslide Database by searching newspaper archives , volume =. doi:10.1016/j.geomorph.2015.05.019 , journal =
-
[4]
Avcıoğlu, Aydoğan and Demir, Og\". An automated approach for developing geohazard inventories using news: integrating natural language processing (NLP), machine learning, and mapping , volume =. Natural Hazards and Earth System Sciences , publisher =. 2025 , month =. doi:10.5194/nhess-25-2421-2025 , number =
-
[5]
Environmental Research Letters , author =
Flash droughts and their impacts—using newspaper articles to assess the perceived consequences of rapidly emerging droughts , volume =. Environmental Research Letters , author =. 2024 , pages =. doi:10.1088/1748-9326/ad58fa , number =
-
[6]
Proceedings of the International AAAI Conference on Web and Social Media , author=
Identifying and Investigating Global News Coverage of Critical Events Such as Disasters and Terrorist Attacks , volume=. Proceedings of the International AAAI Conference on Web and Social Media , author=. 2025 , month=. doi:10.1609/icwsm.v19i1.35818 , number=
-
[7]
Jones, Rebecca Louise and Kharb, Aditi and Tubeuf, Sandy , year =. The untold story of missing data in disaster research: a systematic review of the empirical literature utilising the Emergency Events Database (EM-DAT) , volume =. Environmental Research Letters , publisher =. doi:10.1088/1748-9326/acfd42 , number =
-
[8]
Human and economic impacts of natural disasters: can we trust the global data? , volume =
Jones, Rebecca Louise and Guha-Sapir, Debarati and Tubeuf, Sandy , year =. Human and economic impacts of natural disasters: can we trust the global data? , volume =. Scientific Data , publisher =. doi:10.1038/s41597-022-01667-x , number =
-
[9]
Guzzetti, Fausto and Cardinali, Mauro and Reichenbach, Paola , year =. The AVI project: A bibliographical and archive inventory of landslides and floods in Italy , volume =. Environmental Management , publisher =. doi:10.1007/bf02400865 , number =
-
[10]
Llasat, M. C. and Llasat-Botija, M. and López, L. , year =. A press database on natural risks and its application in the study of floods in Northeastern Spain , volume =. Natural Hazards and Earth System Sciences , publisher =. doi:10.5194/nhess-9-2049-2009 , number =
-
[11]
and de Brito, Mariana Madruga , year =
Sodoge, Jan and Kuhlicke, Christian and Mahecha, Miguel D. and de Brito, Mariana Madruga , year =. Text mining uncovers the unique dynamics of socio-economic impacts of the 2018–2022 multi-year drought in Germany , volume =. Natural Hazards and Earth System Sciences , publisher =. doi:10.5194/nhess-24-1757-2024 , number =
-
[12]
2026 , howpublished=
Climate Change and Migration in Central America: Evidence from New Environmental Event Data , author=. 2026 , howpublished=
2026
-
[13]
Li, Ni and Thiery, Wim and Zahra, Shorouq and Madruga de Brito, Mariana and Worou, Koffi and Kurfalı, Murathan and Lampe, Seppe and Muñoz, Paul and Flynn, Clare and Trigoso, Camila and Nivre, Joakim and Zscheischler, Jakob and Messori, Gabriele , year =. Wikimpacts 1.0: A new global climate impact database based on automated information extraction from Wi...
-
[14]
2026 , eprint=
How Loud Rumbles Hit Newsstands: A Data Analysis of Coverage and Spatial Bias in German News about Landslides Around the World , author=. 2026 , eprint=
2026
-
[15]
2026 , eprint=
Assessing socio-economic climate impacts from text data , author=. 2026 , eprint=
2026
-
[16]
Inhye Kong and Ross S. Purves , title =. Annals of the American Association of Geographers , volume =. 2026 , publisher =. doi:10.1080/24694452.2025.2564220 , URL =
-
[17]
The Sky Is Falling: Predictors of News Coverage of Natural Disasters Worldwide , volume =
Yan, Yan and Bissell, Kim , year =. The Sky Is Falling: Predictors of News Coverage of Natural Disasters Worldwide , volume =. Communication Research , publisher =. doi:10.1177/0093650215573861 , number =
-
[18]
Meehl and Thomas Karl and David R
Gerald A. Meehl and Thomas Karl and David R. Easterling and Stanley Changnon and Roger Pielke and David Changnon and Jenni Evans and Pavel Ya. Groisman and Thomas R. Knutson and Kenneth E. Kunkel and Linda O. Mearns and Camille Parmesan and Roger Pulwarty and Terry Root and Richard T. Sylves and Peter Whetton and Francis Zwiers. An Introduction to Trends ...
-
[19]
and Chang, Heejun and Chester, Mikhail V
McPhillips, Lauren E. and Chang, Heejun and Chester, Mikhail V. and Depietri, Yaella and Friedman, Erin and Grimm, Nancy B. and Kominoski, John S. and McPhearson, Timon and Méndez-Lázaro, Pablo and Rosi, Emma J. and Shafiei Shiva, Javad , title =. Earth's Future , volume =. doi:https://doi.org/10.1002/2017EF000686 , url =. https://agupubs.onlinelibrary.wi...
-
[20]
2006 , publisher=
Extreme events in nature and society , author=. 2006 , publisher=
2006
-
[21]
Chapman, Cassandra M. and Hornsey, Matthew J. and Fielding, Kelly S. and Gulliver, Robyn , year =. International media coverage promotes donations to a climate disaster , volume =. Disasters , publisher =. doi:10.1111/disa.12557 , number =
-
[22]
Routledge handbook of public policy , pages=
Mass media and policy-making , author=. Routledge handbook of public policy , pages=. 2012 , publisher=
2012
-
[23]
Handbuch Umweltsoziologie , pages=
Computational Social Sciences in der Umweltsoziologie , author=. Handbuch Umweltsoziologie , pages=. 2023 , publisher=
2023
-
[24]
Real-Time News Event Extraction for Global Crisis Monitoring , ISBN =
Tanev, Hristo and Piskorski, Jakub and Atkinson, Martin , pages =. Real-Time News Event Extraction for Global Crisis Monitoring , ISBN =. doi:10.1007/978-3-540-69858-6_21 , booktitle =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.