pith. sign in

arxiv: 2605.03414 · v1 · submitted 2026-05-05 · 💻 cs.CL

Geolocating News about Extreme Climate Events: A Comparative Analysis of Off-the-Shelf Tools for Toponym Identification in German

Pith reviewed 2026-05-07 16:48 UTC · model grok-4.3

classification 💻 cs.CL
keywords named entity recognitiontoponym resolutionclimate changeGerman languagemedia analysisdisaster eventsgeolocation
0
0 comments X

The pith

Different NER tools extract different place names from German climate news, producing inconsistent pictures of affected countries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The study compares three off-the-shelf NER tools on German news articles about extreme climate events and disasters. It quantifies how the tools differ in identifying toponyms and shows that these differences affect methods used to determine the country of the event. A reader should care because such automated geolocation is common in climate research, and tool choice may influence findings on media coverage of different nations.

Core claim

The central claim is that contrasts between the NER tools Flair, Spacy, and Stanza in toponym identification for German news lead to distinct outcomes in downstream country assignment tasks, which can alter conclusions about countries' prominence in media reports on extreme climate events.

What carries the argument

The pipeline from NER toponym extraction to country-level geolocation decisions using three extrinsic assignment methods.

Load-bearing premise

That the extrinsic country assignment methods accurately reflect the true event locations and that observed differences stem primarily from the NER tools rather than other factors.

What would settle it

Manual verification of event locations in a set of articles and comparison to the automated country assignments from each tool to see if they agree or diverge systematically.

Figures

Figures reproduced from arXiv: 2605.03414 by Andreas Niekler, Brielen Madureira, Mariana Madruga de Brito.

Figure 1
Figure 1. Figure 1: Excerpt from a German news article (translated by the authors)1 reporting on an extreme climate event, annotated with toponym candidates (detected by Stanza’s German NER tool) and their country. Highlighted entities must be correctly parsed to determine that Brazil, and not Germany, is the actual event’s location. The floods in South Brazil in May, 2024 were so disastrous that they found their way to inter… view at source ↗
Figure 2
Figure 2. Figure 2: Box plots of toponyms’ intersection-over-union values per document for each pair of tools. 5.2. Higher-level task: country prediction Moving on to the main task of determining the geographical focus of each document, we first compare NER tools bilaterally, without the gold standard view at source ↗
Figure 3
Figure 3. Figure 3: The top 14 countries with the highest frequency in the gold standard ranked using each NER tool. identified only by Spacy, we see detached generic terms like Stadt (city), Altstadt (old town), Innenstadt (downtown), Flughafen (airport), Provinzen (provinces), Gemeinde (community) and Rathaus (town hall). Among the top unique toponyms identified by Stanza, apart from the word Sonne (sun) that appears very o… view at source ↗
read the original abstract

Determining the geolocation of extreme climate events and disasters in texts is a common problem in climate impact and adaptation research. Named-entity recognition (NER) tools are typically used to identify a pool of toponyms that serve as candidate event locations. In this study, we conduct a comparative analysis of three off-the-shelf NER tools, namely Flair, Spacy and Stanza. We describe and quantify differences between their outputs for German news articles and evaluate them extrinsically based on three methods to determine the country where events took place. We show how their contrasts are propagated into downstream tasks and can yield distinct decisions about a document's geographical focus, which, in turn, can impact conclusions about countries' prominence in German media.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The manuscript conducts a comparative analysis of three off-the-shelf NER tools (Flair, spaCy, and Stanza) for toponym identification in German news articles about extreme climate events. It quantifies differences in their toponym outputs and evaluates them extrinsically via three country-assignment methods, showing that tool variance propagates to alter document-level geographical focus and downstream statistics on countries' prominence in media coverage.

Significance. If the reported differences hold under scrutiny, the work is significant for climate informatics and applied NLP, as it illustrates the sensitivity of geolocation pipelines to upstream NER choices in a domain-specific setting. The direct side-by-side comparison on a shared corpus and the explicit tracing of effects into downstream country-prominence metrics constitute a practical strength, providing evidence that is independent of any claim about the absolute accuracy of the assignment heuristics.

major comments (1)
  1. [extrinsic evaluation] The extrinsic evaluation section does not report the total number of articles in the corpus, the distribution of toponyms per tool, or any statistical tests (e.g., McNemar or chi-squared) on the frequency of differing country assignments. These omissions make it impossible to assess whether the observed propagation to distinct geographical-focus decisions is frequent enough to materially affect conclusions about country prominence.
minor comments (3)
  1. [methods] The three country-assignment methods are referenced but not fully formalized (e.g., tie-breaking rules when multiple toponyms map to different countries); adding pseudocode or explicit decision trees would improve reproducibility.
  2. [results] No error analysis or example articles are provided showing concrete cases where the three tools produce divergent country labels; including 2-3 such cases would strengthen the propagation claim.
  3. [abstract] The abstract states that differences 'can yield distinct decisions' but does not preview any quantitative measure of divergence; a single sentence with effect size would better orient readers.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address the single major comment below and will incorporate the suggested additions into a revised manuscript.

read point-by-point responses
  1. Referee: [extrinsic evaluation] The extrinsic evaluation section does not report the total number of articles in the corpus, the distribution of toponyms per tool, or any statistical tests (e.g., McNemar or chi-squared) on the frequency of differing country assignments. These omissions make it impossible to assess whether the observed propagation to distinct geographical-focus decisions is frequent enough to materially affect conclusions about country prominence.

    Authors: We agree that these details are necessary for a complete assessment of the practical impact of tool variance. In the revised manuscript we will (1) state the total number of articles in the corpus, (2) add a table or figure reporting the number and distribution of toponyms returned by each tool, and (3) include the results of appropriate statistical tests (chi-squared or McNemar) comparing the frequency of differing country assignments. These additions will allow readers to judge how often the observed propagation materially affects downstream country-prominence statistics. revision: yes

Circularity Check

0 steps flagged

No significant circularity; purely empirical comparison

full rationale

The paper conducts a direct empirical comparison of three off-the-shelf NER tools (Flair, Spacy, Stanza) on German news articles, quantifying output differences in toponym pools and propagating them through three fixed extrinsic country-assignment methods. No derivations, equations, fitted parameters, predictions, or self-citations appear in the load-bearing steps; all contrasts are measured against the same corpus and independent assignment rules, so divergences are attributable to tool variance by construction. The central claim requires no internal reduction to inputs and is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that off-the-shelf NER models are applicable to German news without domain adaptation and that the three country-assignment heuristics are valid extrinsic evaluators. No free parameters or invented entities are introduced.

axioms (2)
  • domain assumption Off-the-shelf NER models trained on general German text can reliably identify toponyms in climate-related news articles.
    Invoked when the authors apply Flair, spaCy, and Stanza directly without fine-tuning or error analysis.
  • domain assumption The three methods for mapping a set of toponyms to a single country label produce meaningful proxies for event location.
    Used in the extrinsic evaluation step described in the abstract.

pith-pipeline@v0.9.0 · 5432 in / 1475 out tokens · 61235 ms · 2026-05-07T16:48:03.000959+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 31 canonical work pages

  1. [1]

    Language Resources and Evaluation , author =

    M. Gritta, M. T. Pilehvar, N. Collier, A pragmatic guide to geoparsing evaluation: Toponyms, Named Entity Recognition and pragmatics, Language Resources and Evaluation 54 (2020) 683–712. URL: http://link.springer.com/10.1007/s10579-019-09475-3. doi:10.1007/s10579-019-09475-3

  2. [2]

    X. Hu, Z. Zhou, H. Li, Y. Hu, F. Gu, J. Kersten, H. Fan, F. Klan, Location Reference Recognition from Texts: A Survey and Comparison, ACM Computing Surveys 56 (2024) 1–37. URL: https: //dl.acm.org/doi/10.1145/3625819. doi:10.1145/3625819

  3. [3]

    D. Otto, M. Pfeiffer, M. M. de Brito, M. Gross, Fixed Amidst Change: 20 Years of Media Coverage on Carbon Capture and Storage in Germany, Sustainability 14 (2022). URL: https://www.mdpi. com/2071-1050/14/12/7342. doi:10.3390/su14127342

  4. [4]

    Sodoge, C

    J. Sodoge, C. Kuhlicke, M. M. d. Brito, Automatized spatio-temporal detection of drought impacts from newspaper articles using natural language processing and machine learning, Weather and Climate Extremes 41 (2023) 100574. URL: https://www.sciencedirect.com/science/article/pii/ S2212094723000270. doi:https://doi.org/10.1016/j.wace.2023.100574

  5. [5]

    J. H. Lochner, A. Stechemesser, L. Wenz, Climate summits and protests have a strong impact on climate change media coverage in Germany, Communications Earth & Environment 5 (2024) 279. URL: https://doi.org/10.1038/s43247-024-01434-3. doi:10.1038/s43247-024-01434-3

  6. [6]

    P. H. L. Alencar, J. Sodoge, E. Nora Paton, M. Madruga De Brito, Flash droughts and their impacts—using newspaper articles to assess the perceived consequences of rapidly emerging droughts, Environmental Research Letters 19 (2024) 074048. URL: https://iopscience.iop.org/ article/10.1088/1748-9326/ad58fa. doi:10.1088/1748-9326/ad58fa

  7. [7]

    I. Kong, R. S. Purves, Analyzing Geographic Bias of Newspaper Articles Reporting Global Climate Disasters, Annals of the American Association of Geographers (2025) 1–19. URL: https://www.tandfonline.com/doi/full/10.1080/24694452.2025.2564220. doi:10.1080/24694452. 2025.2564220

  8. [8]

    Amitay, N

    E. Amitay, N. Har’El, R. Sivan, A. Soffer, Web-a-where: geotagging web content, in: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, Sheffield United Kingdom, 2004, pp. 273–280. URL: https://dl.acm.org/doi/10.1145/ 1008992.1009040. doi:10.1145/1008992.1009040

  9. [9]

    Andogah, G

    G. Andogah, G. Bouma, J. Nerbonne, Every document has a geographical scope, Data & Knowledge Engineering 81-82 (2012) 1–20. URL: https://linkinghub.elsevier.com/retrieve/pii/ S0169023X12000687. doi:10.1016/j.datak.2012.07.002

  10. [10]

    B. R. Monteiro, C. A. Davis, F. Fonseca, A survey on the geographic scope of textual documents, Computers & Geosciences 96 (2016) 23–34. URL: https://linkinghub.elsevier.com/retrieve/pii/ S0098300416301972. doi:10.1016/j.cageo.2016.07.017

  11. [11]

    F. Melo, B. Martins, Automated Geocoding of Textual Documents: A Survey of Current Approaches, Transactions in GIS 21 (2017) 3–38. URL: https://onlinelibrary.wiley.com/doi/10.1111/tgis.12212. doi:10.1111/tgis.12212

  12. [12]

    W. Zong, D. Wu, A. Sun, E.-P. Lim, D. H.-L. Goh, On assigning place names to geography related web pages, in: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, ACM, Denver CO USA, 2005, pp. 354–362. URL: https://dl.acm.org/doi/10.1145/1065385.1065464. doi:10.1145/1065385.1065464

  13. [13]

    S. J. Lee, H. Liu, M. D. Ward, Lost in Space: Geolocation in Event Data, Political Science Research and Methods 7 (2019) 871–888. URL: https://www.cambridge.org/core/product/identifier/ S2049847018000237/type/journal_article. doi:10.1017/psrm.2018.23

  14. [14]

    Benikova, C

    D. Benikova, C. Biemann, M. Reznicek, NoSta-D named entity annotation for German: Guidelines and dataset, in: N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, S. Piperidis (Eds.), Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), European Language Resources As...

  15. [15]

    Riedl, S

    M. Riedl, S. Padó, A named entity recognition shootout for German, in: I. Gurevych, Y. Miyao (Eds.), Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Association for Computational Linguistics, Melbourne, Australia, 2018, pp. 120–125. URL: https://aclanthology.org/P18-2020/. doi:10.18653/v1/P18-2020

  16. [16]

    Labusch, C

    K. Labusch, C. Neudecker, D. Zellhöfer, Bert for named entity recognition in contemporary and historic german, in: Proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019): Long Papers, German Society for Computational Linguistics & Language Technology, Erlangen, Germany, 2019, pp. 1–9. URL: https://konvens.org/proceedings/2019/pap...

  17. [17]

    Leitner, G

    E. Leitner, G. Rehm, J. Moreno-Schneider, A dataset of German legal documents for named entity recognition, in: N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis (Eds.), Proceedings of the Twelfth Language Resources and Evaluation Conference, Eur...

  18. [18]

    Ortmann, A

    K. Ortmann, A. Roussel, S. Dipper, Evaluating Off-the-Shelf NLP Tools for German, in: Proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019): Long Papers, German Society for Computational Linguistics & Language Technology, Erlangen, Germany, 2019, pp. 212–

  19. [19]

    URL: https://sfb1102.uni-saarland.de/sfbunisb/uploads/2020/10/KONVENS2019_paper_55.pdf

  20. [20]

    Scheible, R

    S. Scheible, R. J. Whitt, M. Durrell, P. Bennett, Evaluating an ‘off-the-shelf’ POS-tagger on early Modern German text, in: K. Zervanou, P. Lendvai (Eds.), Proceedings of the 5th ACL- HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, Association for Computational Linguistics, Portland, OR, USA, 2011, pp. 19–23. UR...

  21. [21]

    Laarmann-Quante, L

    R. Laarmann-Quante, L. Prepens, T. Zesch, Evaluating automatic spelling correction tools on German primary school children’s misspellings, in: D. Alfter, E. Volodina, T. François, P. Desmet, F. Cornillie, A. Jönsson, E. Rennes (Eds.), Proceedings of the 11th Workshop on NLP for Computer Assisted Language Learning, LiU Electronic Press, Louvain-la-Neuve, B...

  22. [22]

    Gritta, M

    M. Gritta, M. T. Pilehvar, N. Limsopatham, N. Collier, What’s missing in geographical parsing?, Language Resources and Evaluation 52 (2018) 603–623. URL: http://link.springer.com/10.1007/ s10579-017-9385-8. doi:10.1007/s10579-017-9385-8

  23. [23]

    J. Wang, Y. Hu, Enhancing spatial and textual analysis with EUPEG: An extensible and unified platform for evaluating geoparsers, Transactions in GIS 23 (2019) 1393–1419. URL: https://onlinelibrary.wiley.com/doi/10.1111/tgis.12579. doi:10.1111/tgis.12579

  24. [24]

    Z. Liu, K. Janowicz, L. Cai, R. Zhu, G. Mai, M. Shi, Geoparsing: Solved or Biased? An Evaluation of Geographic Biases in Geoparsing, AGILE: GIScience Series 3 (2022) 1–13. URL: https://agile-giss. copernicus.org/articles/3/9/2022/. doi:10.5194/agile-giss-3-9-2022

  25. [25]

    N. Doms, T. Schlachter, L. Hahn-Woernle, A Geo-Parser for German Documents with En- vironmental Context, in: V. Wohlgemuth, H. Kandil, A. Ramzy (Eds.), Advances and New Trends in Environmental Informatics, Springer Nature Switzerland, Cham, 2025, pp. 21–33. URL: https://link.springer.com/10.1007/978-3-031-85284-8_2. doi:10.1007/978-3-031-85284-8_2, series...

  26. [26]

    M. Won, P. Murrieta-Flores, B. Martins, Ensemble Named Entity Recognition (NER): Evaluat- ing NER Tools in the Identification of Place Names in Historical Corpora, Frontiers in Digital Humanities 5 (2018) 2. URL: http://journal.frontiersin.org/article/10.3389/fdigh.2018.00002/full. doi:10.3389/fdigh.2018.00002

  27. [27]

    Kriesch, S

    L. Kriesch, S. Losacker, A geolocated dataset of German news articles, Scientific Data 12 (2025) 1128. URL: https://www.nature.com/articles/s41597-025-05422-w. doi:10.1038/ s41597-025-05422-w

  28. [28]

    J. L. Leidner, G. Sinclair, B. Webber, Grounding spatial named entities for information extraction and question answering, in: Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, 2003, pp. 31–38. URL: https://aclanthology.org/W03-0105/

  29. [29]

    Badieh Habib Morgan, M

    M. Badieh Habib Morgan, M. van Keulen, Named entity extraction and disambiguation: the missing link, ESAIR ’13, Association for Computing Machinery, New York, NY, USA, 2013, p. 37–40. URL: https://doi.org/10.1145/2513204.2513217. doi:10.1145/2513204.2513217

  30. [30]

    Akbik, T

    A. Akbik, T. Bergmann, D. Blythe, K. Rasul, S. Schweter, R. Vollgraf, FLAIR: An easy-to-use framework for state-of-the-art NLP, in: W. Ammar, A. Louis, N. Mostafazadeh (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Association for Computational Linguistics, Minnea...

  31. [31]

    doi:10.5281/zenodo.1212303 , interhash =

    M. Honnibal, I. Montani, S. Van Landeghem, A. Boyd, spaCy: Industrial-strength Natural Language Processing in Python (2020). URL: https://spacy.io/. doi:10.5281/zenodo.1212303

  32. [32]

    P. Qi, Y. Zhang, Y. Zhang, J. Bolton, C. D. Manning, Stanza: A python natural language processing toolkit for many human languages, in: A. Celikyilmaz, T.-H. Wen (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics, Online, 2020, pp. 101–108. URL: ...

  33. [33]

    D. A. Smith, G. Crane, Disambiguating Geographic Names in a Historical Digital Library, in: G. Goos, J. Hartmanis, J. Van Leeuwen, P. Constantopoulos, I. T. Sølvberg (Eds.), Research and Advanced Technology for Digital Libraries, volume 2163, Springer Berlin Heidelberg, Berlin, Heidelberg, 2001, pp. 127–136. URL: http://link.springer.com/10.1007/3-540-447...

  34. [34]

    R. C. Pasley, P. D. Clough, M. Sanderson, Geo-tagging for imprecise regions of different sizes, in: Proceedings of the 4th ACM workshop on Geographical information retrieval, ACM, Lisbon Portugal, 2007, pp. 77–82. URL: https://dl.acm.org/doi/10.1145/1316948.1316969. doi: 10.1145/ 1316948.1316969

  35. [35]

    M. A. Radke, N. Gautam, A. Tambi, U. A. Deshpande, Z. Syed, Geotagging Text Data on the Web—A Geometrical Approach, IEEE Access 6 (2018) 30086–30099. URL: https://ieeexplore.ieee. org/document/8371593/. doi:10.1109/ACCESS.2018.2843814

  36. [36]

    35 Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R

    C. Spearman, The Proof and Measurement of Association between Two Things, The American Journal of Psychology 15 (1904) 72. URL: https://www.jstor.org/stable/1412159?origin=crossref. doi:10.2307/1412159

  37. [37]

    M. G. Kendall, A new measure of rank correlation, Biometrika 30 (1938) 81–93. URL: https: //doi.org/10.1093/biomet/30.1-2.81

  38. [38]

    N. Li, S. Zahra, M. Brito, C. Flynn, O. Görnerup, K. Worou, M. Kurfali, C. Meng, W. Thiery, J. Zscheischler, G. Messori, J. Nivre, Using LLMs to build a database of climate extreme impacts, in: D. Stammbach, J. Ni, T. Schimanski, K. Dutia, A. Singh, J. Bingler, C. Christiaen, N. Kushwaha, V. Muccione, S. A. Vaghefi, M. Leippold (Eds.), Proceedings of the ...

  39. [39]

    Madruga de Brito, J

    M. Madruga de Brito, J. Sodoge, H. Kreibich, C. Kuhlicke, Comprehensive assessment of flood socioeconomic impacts through text-mining, Water Resources Research 61 (2025) e2024WR037813. URL: https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2024WR037813. doi:https://doi. org/10.1029/2024WR037813

  40. [40]

    T. M. N. Carvalho, A. Niekler, C. Kuhlicke, J. Zscheischler, M. M. de Brito, Global synthesis of peer-reviewed articles reveals blind spots in climate impacts research (2025). URL: http://dx.doi. org/10.21203/rs.3.rs-6095740/v1. doi:10.21203/rs.3.rs-6095740/v1