Geolocating News about Extreme Climate Events: A Comparative Analysis of Off-the-Shelf Tools for Toponym Identification in German

Andreas Niekler; Brielen Madureira; Mariana Madruga de Brito

arxiv: 2605.03414 · v1 · submitted 2026-05-05 · 💻 cs.CL

Geolocating News about Extreme Climate Events: A Comparative Analysis of Off-the-Shelf Tools for Toponym Identification in German

Brielen Madureira , Mariana Madruga de Brito , Andreas Niekler This is my paper

Pith reviewed 2026-05-07 16:48 UTC · model grok-4.3

classification 💻 cs.CL

keywords named entity recognitiontoponym resolutionclimate changeGerman languagemedia analysisdisaster eventsgeolocation

0 comments

The pith

Different NER tools extract different place names from German climate news, producing inconsistent pictures of affected countries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The study compares three off-the-shelf NER tools on German news articles about extreme climate events and disasters. It quantifies how the tools differ in identifying toponyms and shows that these differences affect methods used to determine the country of the event. A reader should care because such automated geolocation is common in climate research, and tool choice may influence findings on media coverage of different nations.

Core claim

The central claim is that contrasts between the NER tools Flair, Spacy, and Stanza in toponym identification for German news lead to distinct outcomes in downstream country assignment tasks, which can alter conclusions about countries' prominence in media reports on extreme climate events.

What carries the argument

The pipeline from NER toponym extraction to country-level geolocation decisions using three extrinsic assignment methods.

Load-bearing premise

That the extrinsic country assignment methods accurately reflect the true event locations and that observed differences stem primarily from the NER tools rather than other factors.

What would settle it

Manual verification of event locations in a set of articles and comparison to the automated country assignments from each tool to see if they agree or diverge systematically.

Figures

Figures reproduced from arXiv: 2605.03414 by Andreas Niekler, Brielen Madureira, Mariana Madruga de Brito.

**Figure 1.** Figure 1: Excerpt from a German news article (translated by the authors)1 reporting on an extreme climate event, annotated with toponym candidates (detected by Stanza’s German NER tool) and their country. Highlighted entities must be correctly parsed to determine that Brazil, and not Germany, is the actual event’s location. The floods in South Brazil in May, 2024 were so disastrous that they found their way to inter… view at source ↗

**Figure 2.** Figure 2: Box plots of toponyms’ intersection-over-union values per document for each pair of tools. 5.2. Higher-level task: country prediction Moving on to the main task of determining the geographical focus of each document, we first compare NER tools bilaterally, without the gold standard view at source ↗

**Figure 3.** Figure 3: The top 14 countries with the highest frequency in the gold standard ranked using each NER tool. identified only by Spacy, we see detached generic terms like Stadt (city), Altstadt (old town), Innenstadt (downtown), Flughafen (airport), Provinzen (provinces), Gemeinde (community) and Rathaus (town hall). Among the top unique toponyms identified by Stanza, apart from the word Sonne (sun) that appears very o… view at source ↗

read the original abstract

Determining the geolocation of extreme climate events and disasters in texts is a common problem in climate impact and adaptation research. Named-entity recognition (NER) tools are typically used to identify a pool of toponyms that serve as candidate event locations. In this study, we conduct a comparative analysis of three off-the-shelf NER tools, namely Flair, Spacy and Stanza. We describe and quantify differences between their outputs for German news articles and evaluate them extrinsically based on three methods to determine the country where events took place. We show how their contrasts are propagated into downstream tasks and can yield distinct decisions about a document's geographical focus, which, in turn, can impact conclusions about countries' prominence in German media.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows that NER tool differences on German climate news produce different country assignments under standard rules, a practical but narrow empirical observation.

read the letter

The main point is that three off-the-shelf NER tools give different sets of place names when applied to German news on extreme climate events, and those differences can change which country gets assigned to a document depending on the aggregation rule used. The authors run Flair, spaCy, and Stanza on the same articles, pull the toponyms, and then test three simple country-selection methods. They report that the tools disagree enough to alter the inferred geographical focus in some cases, which could shift conclusions about country prominence in the media coverage. This is new only in its narrow application to German-language climate reporting and in tracking the effect through to the downstream aggregation step. The work does this cleanly by holding the articles and the assignment rules fixed, so the observed divergence is attributable to the NER outputs by design. That makes the central claim hold without needing ground-truth locations. The extrinsic evaluation is the useful part here, because it moves beyond isolated precision scores to show real pipeline impact. The soft spots are the missing details on corpus size, how the articles were selected, inter-annotator checks if any, and any statistical assessment of the differences. The abstract also does not compare the tools against a gold standard or discuss whether one is more reliable overall. These gaps make it hard to judge the size or robustness of the effect, but they do not undermine the variance demonstration itself. The paper is aimed at people who build text-mining pipelines for climate impact studies and need to know that tool choice is not neutral. A reader in that niche would get a concrete warning about variability. It is not a deep methodological paper and I would not cite it in my own work, but the design is straightforward and the finding is reproducible from the description, so it deserves peer review rather than a desk reject.

Referee Report

1 major / 3 minor

Summary. The manuscript conducts a comparative analysis of three off-the-shelf NER tools (Flair, spaCy, and Stanza) for toponym identification in German news articles about extreme climate events. It quantifies differences in their toponym outputs and evaluates them extrinsically via three country-assignment methods, showing that tool variance propagates to alter document-level geographical focus and downstream statistics on countries' prominence in media coverage.

Significance. If the reported differences hold under scrutiny, the work is significant for climate informatics and applied NLP, as it illustrates the sensitivity of geolocation pipelines to upstream NER choices in a domain-specific setting. The direct side-by-side comparison on a shared corpus and the explicit tracing of effects into downstream country-prominence metrics constitute a practical strength, providing evidence that is independent of any claim about the absolute accuracy of the assignment heuristics.

major comments (1)

[extrinsic evaluation] The extrinsic evaluation section does not report the total number of articles in the corpus, the distribution of toponyms per tool, or any statistical tests (e.g., McNemar or chi-squared) on the frequency of differing country assignments. These omissions make it impossible to assess whether the observed propagation to distinct geographical-focus decisions is frequent enough to materially affect conclusions about country prominence.

minor comments (3)

[methods] The three country-assignment methods are referenced but not fully formalized (e.g., tie-breaking rules when multiple toponyms map to different countries); adding pseudocode or explicit decision trees would improve reproducibility.
[results] No error analysis or example articles are provided showing concrete cases where the three tools produce divergent country labels; including 2-3 such cases would strengthen the propagation claim.
[abstract] The abstract states that differences 'can yield distinct decisions' but does not preview any quantitative measure of divergence; a single sentence with effect size would better orient readers.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address the single major comment below and will incorporate the suggested additions into a revised manuscript.

read point-by-point responses

Referee: [extrinsic evaluation] The extrinsic evaluation section does not report the total number of articles in the corpus, the distribution of toponyms per tool, or any statistical tests (e.g., McNemar or chi-squared) on the frequency of differing country assignments. These omissions make it impossible to assess whether the observed propagation to distinct geographical-focus decisions is frequent enough to materially affect conclusions about country prominence.

Authors: We agree that these details are necessary for a complete assessment of the practical impact of tool variance. In the revised manuscript we will (1) state the total number of articles in the corpus, (2) add a table or figure reporting the number and distribution of toponyms returned by each tool, and (3) include the results of appropriate statistical tests (chi-squared or McNemar) comparing the frequency of differing country assignments. These additions will allow readers to judge how often the observed propagation materially affects downstream country-prominence statistics. revision: yes

Circularity Check

0 steps flagged

No significant circularity; purely empirical comparison

full rationale

The paper conducts a direct empirical comparison of three off-the-shelf NER tools (Flair, Spacy, Stanza) on German news articles, quantifying output differences in toponym pools and propagating them through three fixed extrinsic country-assignment methods. No derivations, equations, fitted parameters, predictions, or self-citations appear in the load-bearing steps; all contrasts are measured against the same corpus and independent assignment rules, so divergences are attributable to tool variance by construction. The central claim requires no internal reduction to inputs and is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that off-the-shelf NER models are applicable to German news without domain adaptation and that the three country-assignment heuristics are valid extrinsic evaluators. No free parameters or invented entities are introduced.

axioms (2)

domain assumption Off-the-shelf NER models trained on general German text can reliably identify toponyms in climate-related news articles.
Invoked when the authors apply Flair, spaCy, and Stanza directly without fine-tuning or error analysis.
domain assumption The three methods for mapping a set of toponyms to a single country label produce meaningful proxies for event location.
Used in the extrinsic evaluation step described in the abstract.

pith-pipeline@v0.9.0 · 5432 in / 1475 out tokens · 61235 ms · 2026-05-07T16:48:03.000959+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 31 canonical work pages

[1]

Language Resources and Evaluation , author =

M. Gritta, M. T. Pilehvar, N. Collier, A pragmatic guide to geoparsing evaluation: Toponyms, Named Entity Recognition and pragmatics, Language Resources and Evaluation 54 (2020) 683–712. URL: http://link.springer.com/10.1007/s10579-019-09475-3. doi:10.1007/s10579-019-09475-3

work page doi:10.1007/s10579-019-09475-3 2020
[2]

X. Hu, Z. Zhou, H. Li, Y. Hu, F. Gu, J. Kersten, H. Fan, F. Klan, Location Reference Recognition from Texts: A Survey and Comparison, ACM Computing Surveys 56 (2024) 1–37. URL: https: //dl.acm.org/doi/10.1145/3625819. doi:10.1145/3625819

work page doi:10.1145/3625819 2024
[3]

D. Otto, M. Pfeiffer, M. M. de Brito, M. Gross, Fixed Amidst Change: 20 Years of Media Coverage on Carbon Capture and Storage in Germany, Sustainability 14 (2022). URL: https://www.mdpi. com/2071-1050/14/12/7342. doi:10.3390/su14127342

work page doi:10.3390/su14127342 2022
[4]

Sodoge, C

J. Sodoge, C. Kuhlicke, M. M. d. Brito, Automatized spatio-temporal detection of drought impacts from newspaper articles using natural language processing and machine learning, Weather and Climate Extremes 41 (2023) 100574. URL: https://www.sciencedirect.com/science/article/pii/ S2212094723000270. doi:https://doi.org/10.1016/j.wace.2023.100574

work page doi:10.1016/j.wace.2023.100574 2023
[5]

J. H. Lochner, A. Stechemesser, L. Wenz, Climate summits and protests have a strong impact on climate change media coverage in Germany, Communications Earth & Environment 5 (2024) 279. URL: https://doi.org/10.1038/s43247-024-01434-3. doi:10.1038/s43247-024-01434-3

work page doi:10.1038/s43247-024-01434-3 2024
[6]

P. H. L. Alencar, J. Sodoge, E. Nora Paton, M. Madruga De Brito, Flash droughts and their impacts—using newspaper articles to assess the perceived consequences of rapidly emerging droughts, Environmental Research Letters 19 (2024) 074048. URL: https://iopscience.iop.org/ article/10.1088/1748-9326/ad58fa. doi:10.1088/1748-9326/ad58fa

work page doi:10.1088/1748-9326/ad58fa 2024
[7]

I. Kong, R. S. Purves, Analyzing Geographic Bias of Newspaper Articles Reporting Global Climate Disasters, Annals of the American Association of Geographers (2025) 1–19. URL: https://www.tandfonline.com/doi/full/10.1080/24694452.2025.2564220. doi:10.1080/24694452. 2025.2564220

work page doi:10.1080/24694452.2025.2564220 2025
[8]

Amitay, N

E. Amitay, N. Har’El, R. Sivan, A. Soffer, Web-a-where: geotagging web content, in: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, Sheffield United Kingdom, 2004, pp. 273–280. URL: https://dl.acm.org/doi/10.1145/ 1008992.1009040. doi:10.1145/1008992.1009040

work page doi:10.1145/1008992.1009040 2004
[9]

Andogah, G

G. Andogah, G. Bouma, J. Nerbonne, Every document has a geographical scope, Data & Knowledge Engineering 81-82 (2012) 1–20. URL: https://linkinghub.elsevier.com/retrieve/pii/ S0169023X12000687. doi:10.1016/j.datak.2012.07.002

work page doi:10.1016/j.datak.2012.07.002 2012
[10]

B. R. Monteiro, C. A. Davis, F. Fonseca, A survey on the geographic scope of textual documents, Computers & Geosciences 96 (2016) 23–34. URL: https://linkinghub.elsevier.com/retrieve/pii/ S0098300416301972. doi:10.1016/j.cageo.2016.07.017

work page doi:10.1016/j.cageo.2016.07.017 2016
[11]

F. Melo, B. Martins, Automated Geocoding of Textual Documents: A Survey of Current Approaches, Transactions in GIS 21 (2017) 3–38. URL: https://onlinelibrary.wiley.com/doi/10.1111/tgis.12212. doi:10.1111/tgis.12212

work page doi:10.1111/tgis.12212 2017
[12]

W. Zong, D. Wu, A. Sun, E.-P. Lim, D. H.-L. Goh, On assigning place names to geography related web pages, in: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, ACM, Denver CO USA, 2005, pp. 354–362. URL: https://dl.acm.org/doi/10.1145/1065385.1065464. doi:10.1145/1065385.1065464

work page doi:10.1145/1065385.1065464 2005
[13]

S. J. Lee, H. Liu, M. D. Ward, Lost in Space: Geolocation in Event Data, Political Science Research and Methods 7 (2019) 871–888. URL: https://www.cambridge.org/core/product/identifier/ S2049847018000237/type/journal_article. doi:10.1017/psrm.2018.23

work page doi:10.1017/psrm.2018.23 2019
[14]

Benikova, C

D. Benikova, C. Biemann, M. Reznicek, NoSta-D named entity annotation for German: Guidelines and dataset, in: N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, S. Piperidis (Eds.), Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), European Language Resources As...

2014
[15]

Riedl, S

M. Riedl, S. Padó, A named entity recognition shootout for German, in: I. Gurevych, Y. Miyao (Eds.), Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Association for Computational Linguistics, Melbourne, Australia, 2018, pp. 120–125. URL: https://aclanthology.org/P18-2020/. doi:10.18653/v1/P18-2020

work page doi:10.18653/v1/p18-2020 2018
[16]

Labusch, C

K. Labusch, C. Neudecker, D. Zellhöfer, Bert for named entity recognition in contemporary and historic german, in: Proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019): Long Papers, German Society for Computational Linguistics & Language Technology, Erlangen, Germany, 2019, pp. 1–9. URL: https://konvens.org/proceedings/2019/pap...

2019
[17]

Leitner, G

E. Leitner, G. Rehm, J. Moreno-Schneider, A dataset of German legal documents for named entity recognition, in: N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis (Eds.), Proceedings of the Twelfth Language Resources and Evaluation Conference, Eur...

2020
[18]

Ortmann, A

K. Ortmann, A. Roussel, S. Dipper, Evaluating Off-the-Shelf NLP Tools for German, in: Proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019): Long Papers, German Society for Computational Linguistics & Language Technology, Erlangen, Germany, 2019, pp. 212–

2019
[19]

URL: https://sfb1102.uni-saarland.de/sfbunisb/uploads/2020/10/KONVENS2019_paper_55.pdf

2020
[20]

Scheible, R

S. Scheible, R. J. Whitt, M. Durrell, P. Bennett, Evaluating an ‘off-the-shelf’ POS-tagger on early Modern German text, in: K. Zervanou, P. Lendvai (Eds.), Proceedings of the 5th ACL- HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, Association for Computational Linguistics, Portland, OR, USA, 2011, pp. 19–23. UR...

2011
[21]

Laarmann-Quante, L

R. Laarmann-Quante, L. Prepens, T. Zesch, Evaluating automatic spelling correction tools on German primary school children’s misspellings, in: D. Alfter, E. Volodina, T. François, P. Desmet, F. Cornillie, A. Jönsson, E. Rennes (Eds.), Proceedings of the 11th Workshop on NLP for Computer Assisted Language Learning, LiU Electronic Press, Louvain-la-Neuve, B...

2022
[22]

Gritta, M

M. Gritta, M. T. Pilehvar, N. Limsopatham, N. Collier, What’s missing in geographical parsing?, Language Resources and Evaluation 52 (2018) 603–623. URL: http://link.springer.com/10.1007/ s10579-017-9385-8. doi:10.1007/s10579-017-9385-8

work page doi:10.1007/s10579-017-9385-8 2018
[23]

J. Wang, Y. Hu, Enhancing spatial and textual analysis with EUPEG: An extensible and unified platform for evaluating geoparsers, Transactions in GIS 23 (2019) 1393–1419. URL: https://onlinelibrary.wiley.com/doi/10.1111/tgis.12579. doi:10.1111/tgis.12579

work page doi:10.1111/tgis.12579 2019
[24]

Z. Liu, K. Janowicz, L. Cai, R. Zhu, G. Mai, M. Shi, Geoparsing: Solved or Biased? An Evaluation of Geographic Biases in Geoparsing, AGILE: GIScience Series 3 (2022) 1–13. URL: https://agile-giss. copernicus.org/articles/3/9/2022/. doi:10.5194/agile-giss-3-9-2022

work page doi:10.5194/agile-giss-3-9-2022 2022
[25]

N. Doms, T. Schlachter, L. Hahn-Woernle, A Geo-Parser for German Documents with En- vironmental Context, in: V. Wohlgemuth, H. Kandil, A. Ramzy (Eds.), Advances and New Trends in Environmental Informatics, Springer Nature Switzerland, Cham, 2025, pp. 21–33. URL: https://link.springer.com/10.1007/978-3-031-85284-8_2. doi:10.1007/978-3-031-85284-8_2, series...

work page doi:10.1007/978-3-031-85284-8_2 2025
[26]

M. Won, P. Murrieta-Flores, B. Martins, Ensemble Named Entity Recognition (NER): Evaluat- ing NER Tools in the Identification of Place Names in Historical Corpora, Frontiers in Digital Humanities 5 (2018) 2. URL: http://journal.frontiersin.org/article/10.3389/fdigh.2018.00002/full. doi:10.3389/fdigh.2018.00002

work page doi:10.3389/fdigh.2018.00002/full 2018
[27]

Kriesch, S

L. Kriesch, S. Losacker, A geolocated dataset of German news articles, Scientific Data 12 (2025) 1128. URL: https://www.nature.com/articles/s41597-025-05422-w. doi:10.1038/ s41597-025-05422-w

2025
[28]

J. L. Leidner, G. Sinclair, B. Webber, Grounding spatial named entities for information extraction and question answering, in: Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, 2003, pp. 31–38. URL: https://aclanthology.org/W03-0105/

2003
[29]

Badieh Habib Morgan, M

M. Badieh Habib Morgan, M. van Keulen, Named entity extraction and disambiguation: the missing link, ESAIR ’13, Association for Computing Machinery, New York, NY, USA, 2013, p. 37–40. URL: https://doi.org/10.1145/2513204.2513217. doi:10.1145/2513204.2513217

work page doi:10.1145/2513204.2513217 2013
[30]

Akbik, T

A. Akbik, T. Bergmann, D. Blythe, K. Rasul, S. Schweter, R. Vollgraf, FLAIR: An easy-to-use framework for state-of-the-art NLP, in: W. Ammar, A. Louis, N. Mostafazadeh (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Association for Computational Linguistics, Minnea...

work page doi:10.18653/v1/n19-4010 2019
[31]

doi:10.5281/zenodo.1212303 , interhash =

M. Honnibal, I. Montani, S. Van Landeghem, A. Boyd, spaCy: Industrial-strength Natural Language Processing in Python (2020). URL: https://spacy.io/. doi:10.5281/zenodo.1212303

work page doi:10.5281/zenodo.1212303 2020
[32]

P. Qi, Y. Zhang, Y. Zhang, J. Bolton, C. D. Manning, Stanza: A python natural language processing toolkit for many human languages, in: A. Celikyilmaz, T.-H. Wen (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics, Online, 2020, pp. 101–108. URL: ...

work page doi:10.18653/v1/2020.acl-demos.14 2020
[33]

D. A. Smith, G. Crane, Disambiguating Geographic Names in a Historical Digital Library, in: G. Goos, J. Hartmanis, J. Van Leeuwen, P. Constantopoulos, I. T. Sølvberg (Eds.), Research and Advanced Technology for Digital Libraries, volume 2163, Springer Berlin Heidelberg, Berlin, Heidelberg, 2001, pp. 127–136. URL: http://link.springer.com/10.1007/3-540-447...

work page doi:10.1007/3-540-44796-2_12 2001
[34]

R. C. Pasley, P. D. Clough, M. Sanderson, Geo-tagging for imprecise regions of different sizes, in: Proceedings of the 4th ACM workshop on Geographical information retrieval, ACM, Lisbon Portugal, 2007, pp. 77–82. URL: https://dl.acm.org/doi/10.1145/1316948.1316969. doi: 10.1145/ 1316948.1316969

work page doi:10.1145/1316948.1316969 2007
[35]

M. A. Radke, N. Gautam, A. Tambi, U. A. Deshpande, Z. Syed, Geotagging Text Data on the Web—A Geometrical Approach, IEEE Access 6 (2018) 30086–30099. URL: https://ieeexplore.ieee. org/document/8371593/. doi:10.1109/ACCESS.2018.2843814

work page doi:10.1109/access.2018.2843814 2018
[36]

35 Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R

C. Spearman, The Proof and Measurement of Association between Two Things, The American Journal of Psychology 15 (1904) 72. URL: https://www.jstor.org/stable/1412159?origin=crossref. doi:10.2307/1412159

work page doi:10.2307/1412159 1904
[37]

M. G. Kendall, A new measure of rank correlation, Biometrika 30 (1938) 81–93. URL: https: //doi.org/10.1093/biomet/30.1-2.81

work page doi:10.1093/biomet/30.1-2.81 1938
[38]

N. Li, S. Zahra, M. Brito, C. Flynn, O. Görnerup, K. Worou, M. Kurfali, C. Meng, W. Thiery, J. Zscheischler, G. Messori, J. Nivre, Using LLMs to build a database of climate extreme impacts, in: D. Stammbach, J. Ni, T. Schimanski, K. Dutia, A. Singh, J. Bingler, C. Christiaen, N. Kushwaha, V. Muccione, S. A. Vaghefi, M. Leippold (Eds.), Proceedings of the ...

work page doi:10.18653/v1/2024.climatenlp-1.7 2024
[39]

Madruga de Brito, J

M. Madruga de Brito, J. Sodoge, H. Kreibich, C. Kuhlicke, Comprehensive assessment of flood socioeconomic impacts through text-mining, Water Resources Research 61 (2025) e2024WR037813. URL: https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2024WR037813. doi:https://doi. org/10.1029/2024WR037813

work page doi:10.1029/2024wr037813 2025
[40]

T. M. N. Carvalho, A. Niekler, C. Kuhlicke, J. Zscheischler, M. M. de Brito, Global synthesis of peer-reviewed articles reveals blind spots in climate impacts research (2025). URL: http://dx.doi. org/10.21203/rs.3.rs-6095740/v1. doi:10.21203/rs.3.rs-6095740/v1

work page doi:10.21203/rs.3.rs-6095740/v1 2025

[1] [1]

Language Resources and Evaluation , author =

M. Gritta, M. T. Pilehvar, N. Collier, A pragmatic guide to geoparsing evaluation: Toponyms, Named Entity Recognition and pragmatics, Language Resources and Evaluation 54 (2020) 683–712. URL: http://link.springer.com/10.1007/s10579-019-09475-3. doi:10.1007/s10579-019-09475-3

work page doi:10.1007/s10579-019-09475-3 2020

[2] [2]

X. Hu, Z. Zhou, H. Li, Y. Hu, F. Gu, J. Kersten, H. Fan, F. Klan, Location Reference Recognition from Texts: A Survey and Comparison, ACM Computing Surveys 56 (2024) 1–37. URL: https: //dl.acm.org/doi/10.1145/3625819. doi:10.1145/3625819

work page doi:10.1145/3625819 2024

[3] [3]

D. Otto, M. Pfeiffer, M. M. de Brito, M. Gross, Fixed Amidst Change: 20 Years of Media Coverage on Carbon Capture and Storage in Germany, Sustainability 14 (2022). URL: https://www.mdpi. com/2071-1050/14/12/7342. doi:10.3390/su14127342

work page doi:10.3390/su14127342 2022

[4] [4]

Sodoge, C

J. Sodoge, C. Kuhlicke, M. M. d. Brito, Automatized spatio-temporal detection of drought impacts from newspaper articles using natural language processing and machine learning, Weather and Climate Extremes 41 (2023) 100574. URL: https://www.sciencedirect.com/science/article/pii/ S2212094723000270. doi:https://doi.org/10.1016/j.wace.2023.100574

work page doi:10.1016/j.wace.2023.100574 2023

[5] [5]

J. H. Lochner, A. Stechemesser, L. Wenz, Climate summits and protests have a strong impact on climate change media coverage in Germany, Communications Earth & Environment 5 (2024) 279. URL: https://doi.org/10.1038/s43247-024-01434-3. doi:10.1038/s43247-024-01434-3

work page doi:10.1038/s43247-024-01434-3 2024

[6] [6]

P. H. L. Alencar, J. Sodoge, E. Nora Paton, M. Madruga De Brito, Flash droughts and their impacts—using newspaper articles to assess the perceived consequences of rapidly emerging droughts, Environmental Research Letters 19 (2024) 074048. URL: https://iopscience.iop.org/ article/10.1088/1748-9326/ad58fa. doi:10.1088/1748-9326/ad58fa

work page doi:10.1088/1748-9326/ad58fa 2024

[7] [7]

I. Kong, R. S. Purves, Analyzing Geographic Bias of Newspaper Articles Reporting Global Climate Disasters, Annals of the American Association of Geographers (2025) 1–19. URL: https://www.tandfonline.com/doi/full/10.1080/24694452.2025.2564220. doi:10.1080/24694452. 2025.2564220

work page doi:10.1080/24694452.2025.2564220 2025

[8] [8]

Amitay, N

E. Amitay, N. Har’El, R. Sivan, A. Soffer, Web-a-where: geotagging web content, in: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, Sheffield United Kingdom, 2004, pp. 273–280. URL: https://dl.acm.org/doi/10.1145/ 1008992.1009040. doi:10.1145/1008992.1009040

work page doi:10.1145/1008992.1009040 2004

[9] [9]

Andogah, G

G. Andogah, G. Bouma, J. Nerbonne, Every document has a geographical scope, Data & Knowledge Engineering 81-82 (2012) 1–20. URL: https://linkinghub.elsevier.com/retrieve/pii/ S0169023X12000687. doi:10.1016/j.datak.2012.07.002

work page doi:10.1016/j.datak.2012.07.002 2012

[10] [10]

B. R. Monteiro, C. A. Davis, F. Fonseca, A survey on the geographic scope of textual documents, Computers & Geosciences 96 (2016) 23–34. URL: https://linkinghub.elsevier.com/retrieve/pii/ S0098300416301972. doi:10.1016/j.cageo.2016.07.017

work page doi:10.1016/j.cageo.2016.07.017 2016

[11] [11]

F. Melo, B. Martins, Automated Geocoding of Textual Documents: A Survey of Current Approaches, Transactions in GIS 21 (2017) 3–38. URL: https://onlinelibrary.wiley.com/doi/10.1111/tgis.12212. doi:10.1111/tgis.12212

work page doi:10.1111/tgis.12212 2017

[12] [12]

W. Zong, D. Wu, A. Sun, E.-P. Lim, D. H.-L. Goh, On assigning place names to geography related web pages, in: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, ACM, Denver CO USA, 2005, pp. 354–362. URL: https://dl.acm.org/doi/10.1145/1065385.1065464. doi:10.1145/1065385.1065464

work page doi:10.1145/1065385.1065464 2005

[13] [13]

S. J. Lee, H. Liu, M. D. Ward, Lost in Space: Geolocation in Event Data, Political Science Research and Methods 7 (2019) 871–888. URL: https://www.cambridge.org/core/product/identifier/ S2049847018000237/type/journal_article. doi:10.1017/psrm.2018.23

work page doi:10.1017/psrm.2018.23 2019

[14] [14]

Benikova, C

D. Benikova, C. Biemann, M. Reznicek, NoSta-D named entity annotation for German: Guidelines and dataset, in: N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, S. Piperidis (Eds.), Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), European Language Resources As...

2014

[15] [15]

Riedl, S

M. Riedl, S. Padó, A named entity recognition shootout for German, in: I. Gurevych, Y. Miyao (Eds.), Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Association for Computational Linguistics, Melbourne, Australia, 2018, pp. 120–125. URL: https://aclanthology.org/P18-2020/. doi:10.18653/v1/P18-2020

work page doi:10.18653/v1/p18-2020 2018

[16] [16]

Labusch, C

K. Labusch, C. Neudecker, D. Zellhöfer, Bert for named entity recognition in contemporary and historic german, in: Proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019): Long Papers, German Society for Computational Linguistics & Language Technology, Erlangen, Germany, 2019, pp. 1–9. URL: https://konvens.org/proceedings/2019/pap...

2019

[17] [17]

Leitner, G

E. Leitner, G. Rehm, J. Moreno-Schneider, A dataset of German legal documents for named entity recognition, in: N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis (Eds.), Proceedings of the Twelfth Language Resources and Evaluation Conference, Eur...

2020

[18] [18]

Ortmann, A

K. Ortmann, A. Roussel, S. Dipper, Evaluating Off-the-Shelf NLP Tools for German, in: Proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019): Long Papers, German Society for Computational Linguistics & Language Technology, Erlangen, Germany, 2019, pp. 212–

2019

[19] [19]

URL: https://sfb1102.uni-saarland.de/sfbunisb/uploads/2020/10/KONVENS2019_paper_55.pdf

2020

[20] [20]

Scheible, R

S. Scheible, R. J. Whitt, M. Durrell, P. Bennett, Evaluating an ‘off-the-shelf’ POS-tagger on early Modern German text, in: K. Zervanou, P. Lendvai (Eds.), Proceedings of the 5th ACL- HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, Association for Computational Linguistics, Portland, OR, USA, 2011, pp. 19–23. UR...

2011

[21] [21]

Laarmann-Quante, L

R. Laarmann-Quante, L. Prepens, T. Zesch, Evaluating automatic spelling correction tools on German primary school children’s misspellings, in: D. Alfter, E. Volodina, T. François, P. Desmet, F. Cornillie, A. Jönsson, E. Rennes (Eds.), Proceedings of the 11th Workshop on NLP for Computer Assisted Language Learning, LiU Electronic Press, Louvain-la-Neuve, B...

2022

[22] [22]

Gritta, M

M. Gritta, M. T. Pilehvar, N. Limsopatham, N. Collier, What’s missing in geographical parsing?, Language Resources and Evaluation 52 (2018) 603–623. URL: http://link.springer.com/10.1007/ s10579-017-9385-8. doi:10.1007/s10579-017-9385-8

work page doi:10.1007/s10579-017-9385-8 2018

[23] [23]

J. Wang, Y. Hu, Enhancing spatial and textual analysis with EUPEG: An extensible and unified platform for evaluating geoparsers, Transactions in GIS 23 (2019) 1393–1419. URL: https://onlinelibrary.wiley.com/doi/10.1111/tgis.12579. doi:10.1111/tgis.12579

work page doi:10.1111/tgis.12579 2019

[24] [24]

Z. Liu, K. Janowicz, L. Cai, R. Zhu, G. Mai, M. Shi, Geoparsing: Solved or Biased? An Evaluation of Geographic Biases in Geoparsing, AGILE: GIScience Series 3 (2022) 1–13. URL: https://agile-giss. copernicus.org/articles/3/9/2022/. doi:10.5194/agile-giss-3-9-2022

work page doi:10.5194/agile-giss-3-9-2022 2022

[25] [25]

N. Doms, T. Schlachter, L. Hahn-Woernle, A Geo-Parser for German Documents with En- vironmental Context, in: V. Wohlgemuth, H. Kandil, A. Ramzy (Eds.), Advances and New Trends in Environmental Informatics, Springer Nature Switzerland, Cham, 2025, pp. 21–33. URL: https://link.springer.com/10.1007/978-3-031-85284-8_2. doi:10.1007/978-3-031-85284-8_2, series...

work page doi:10.1007/978-3-031-85284-8_2 2025

[26] [26]

M. Won, P. Murrieta-Flores, B. Martins, Ensemble Named Entity Recognition (NER): Evaluat- ing NER Tools in the Identification of Place Names in Historical Corpora, Frontiers in Digital Humanities 5 (2018) 2. URL: http://journal.frontiersin.org/article/10.3389/fdigh.2018.00002/full. doi:10.3389/fdigh.2018.00002

work page doi:10.3389/fdigh.2018.00002/full 2018

[27] [27]

Kriesch, S

L. Kriesch, S. Losacker, A geolocated dataset of German news articles, Scientific Data 12 (2025) 1128. URL: https://www.nature.com/articles/s41597-025-05422-w. doi:10.1038/ s41597-025-05422-w

2025

[28] [28]

J. L. Leidner, G. Sinclair, B. Webber, Grounding spatial named entities for information extraction and question answering, in: Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, 2003, pp. 31–38. URL: https://aclanthology.org/W03-0105/

2003

[29] [29]

Badieh Habib Morgan, M

M. Badieh Habib Morgan, M. van Keulen, Named entity extraction and disambiguation: the missing link, ESAIR ’13, Association for Computing Machinery, New York, NY, USA, 2013, p. 37–40. URL: https://doi.org/10.1145/2513204.2513217. doi:10.1145/2513204.2513217

work page doi:10.1145/2513204.2513217 2013

[30] [30]

Akbik, T

A. Akbik, T. Bergmann, D. Blythe, K. Rasul, S. Schweter, R. Vollgraf, FLAIR: An easy-to-use framework for state-of-the-art NLP, in: W. Ammar, A. Louis, N. Mostafazadeh (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Association for Computational Linguistics, Minnea...

work page doi:10.18653/v1/n19-4010 2019

[31] [31]

doi:10.5281/zenodo.1212303 , interhash =

M. Honnibal, I. Montani, S. Van Landeghem, A. Boyd, spaCy: Industrial-strength Natural Language Processing in Python (2020). URL: https://spacy.io/. doi:10.5281/zenodo.1212303

work page doi:10.5281/zenodo.1212303 2020

[32] [32]

P. Qi, Y. Zhang, Y. Zhang, J. Bolton, C. D. Manning, Stanza: A python natural language processing toolkit for many human languages, in: A. Celikyilmaz, T.-H. Wen (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics, Online, 2020, pp. 101–108. URL: ...

work page doi:10.18653/v1/2020.acl-demos.14 2020

[33] [33]

D. A. Smith, G. Crane, Disambiguating Geographic Names in a Historical Digital Library, in: G. Goos, J. Hartmanis, J. Van Leeuwen, P. Constantopoulos, I. T. Sølvberg (Eds.), Research and Advanced Technology for Digital Libraries, volume 2163, Springer Berlin Heidelberg, Berlin, Heidelberg, 2001, pp. 127–136. URL: http://link.springer.com/10.1007/3-540-447...

work page doi:10.1007/3-540-44796-2_12 2001

[34] [34]

R. C. Pasley, P. D. Clough, M. Sanderson, Geo-tagging for imprecise regions of different sizes, in: Proceedings of the 4th ACM workshop on Geographical information retrieval, ACM, Lisbon Portugal, 2007, pp. 77–82. URL: https://dl.acm.org/doi/10.1145/1316948.1316969. doi: 10.1145/ 1316948.1316969

work page doi:10.1145/1316948.1316969 2007

[35] [35]

M. A. Radke, N. Gautam, A. Tambi, U. A. Deshpande, Z. Syed, Geotagging Text Data on the Web—A Geometrical Approach, IEEE Access 6 (2018) 30086–30099. URL: https://ieeexplore.ieee. org/document/8371593/. doi:10.1109/ACCESS.2018.2843814

work page doi:10.1109/access.2018.2843814 2018

[36] [36]

35 Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R

C. Spearman, The Proof and Measurement of Association between Two Things, The American Journal of Psychology 15 (1904) 72. URL: https://www.jstor.org/stable/1412159?origin=crossref. doi:10.2307/1412159

work page doi:10.2307/1412159 1904

[37] [37]

M. G. Kendall, A new measure of rank correlation, Biometrika 30 (1938) 81–93. URL: https: //doi.org/10.1093/biomet/30.1-2.81

work page doi:10.1093/biomet/30.1-2.81 1938

[38] [38]

N. Li, S. Zahra, M. Brito, C. Flynn, O. Görnerup, K. Worou, M. Kurfali, C. Meng, W. Thiery, J. Zscheischler, G. Messori, J. Nivre, Using LLMs to build a database of climate extreme impacts, in: D. Stammbach, J. Ni, T. Schimanski, K. Dutia, A. Singh, J. Bingler, C. Christiaen, N. Kushwaha, V. Muccione, S. A. Vaghefi, M. Leippold (Eds.), Proceedings of the ...

work page doi:10.18653/v1/2024.climatenlp-1.7 2024

[39] [39]

Madruga de Brito, J

M. Madruga de Brito, J. Sodoge, H. Kreibich, C. Kuhlicke, Comprehensive assessment of flood socioeconomic impacts through text-mining, Water Resources Research 61 (2025) e2024WR037813. URL: https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2024WR037813. doi:https://doi. org/10.1029/2024WR037813

work page doi:10.1029/2024wr037813 2025

[40] [40]

T. M. N. Carvalho, A. Niekler, C. Kuhlicke, J. Zscheischler, M. M. de Brito, Global synthesis of peer-reviewed articles reveals blind spots in climate impacts research (2025). URL: http://dx.doi. org/10.21203/rs.3.rs-6095740/v1. doi:10.21203/rs.3.rs-6095740/v1

work page doi:10.21203/rs.3.rs-6095740/v1 2025