Retrieving Floods without Floodlights: Topic Models as Binary Classifiers for Extreme Climate Events in German News
Pith reviewed 2026-05-07 16:52 UTC · model grok-4.3
The pith
Topic models can serve as binary classifiers to retrieve relevant German news on specific extreme climate events by using their standard posterior probabilities and keyword guidance without any retraining.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that standard topic models can be employed as binary classifiers for refining the retrieval of news about seven types of extreme climate events in German media by relying on their estimated posterior distributions over topics to select relevant documents. This is done without any modification to the training procedure. Using an annotated sample for evaluation, the probabilities assigned to the keywords that were used to initially query the news databases prove informative for choosing relevant topics and thereby improve the precision of the retrieved sample. Results vary across different hazards, arguing against treating climate events as a uniform category in natural langua
What carries the argument
The posterior probability distributions that topic models assign to each document, which measure association with selected topics and serve to decide relevance for retrieval.
If this is right
- Keyword probabilities from the initial database queries can be used to select better topics and raise precision in the retrieved sample.
- Classifier performance depends on the specific type of climate hazard, so results for one event do not necessarily apply to others.
- Standard topic models without modification provide an interpretable alternative to fine-tuned embedding classifiers when labeled data is scarce.
- Large language models may achieve different precision levels, with observed trade-offs compared to the topic-model approach.
Where Pith is reading between the lines
- The hazard-specific pattern implies that media monitoring systems should build separate filters for each climate event type rather than one general model.
- The method's reliance on small annotated samples suggests it could extend to other retrieval tasks where full supervision is unavailable, such as political or health reporting.
- Topic interpretability offers a way to inspect which themes drive coverage differences across hazards, which could inform future studies of media framing.
Load-bearing premise
That the probabilities from standard topic models can reliably separate relevant from irrelevant news documents for individual climate hazards using only guidance from keyword probabilities and a small annotated sample.
What would settle it
If a fresh annotated collection of German news articles shows that documents selected by topic posteriors have no higher precision than those returned by the original keyword queries alone, the utility of the method would be refuted.
Figures
read the original abstract
In studies of media coverage of extreme climate events, NLP methods have become indispensable for identifying relevant texts in large news databases. Still, enough annotated data to train accurate deep learning-based classifiers from scratch is often not available. Topic Models have the advantage of being both unsupervised and interpretable, but are typically used only for exploratory analysis or data characterisation. In this study, we investigate how to employ Topic Models as binary classifiers for refining the retrieval of relevant news about seven types of extreme climate events in the German media. Our method relies on the posterior distributions estimated by Topic Models to select relevant documents, without modifying their training procedure. Using an annotated sample to guide the evaluation, we show that the probabilities assigned to keywords used to query news databases can also be informative for selecting relevant topics and improve sample precision. We compare our results to a fine-tuned text embedding classifier and an open-weight LLM, discussing observed trade-offs, e.g. the LLM's lowest precision. Moreover, we show that results are hazard-dependent, which speaks against considering climate events as a single category in NLP tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes repurposing unmodified topic models (e.g., LDA) as binary classifiers to refine retrieval of German news articles on seven specific extreme climate hazards. Topics are ranked by the posterior probability mass assigned to the original keyword queries within each topic-word distribution; documents are then assigned to the top-ranked topics to form a relevance classifier. Using an annotated sample both to select topics and to measure performance, the authors report precision gains over keyword search, compare against fine-tuned embedding models and an open LLM, and show that effectiveness varies substantially by hazard type.
Significance. If the evaluation protocol is corrected, the work supplies a low-resource, fully unsupervised and interpretable baseline that can be applied to any keyword-seeded news corpus. The hazard-dependency result is a substantive finding for climate-communication NLP, arguing against treating all extreme events as a single category. The direct comparison to contemporary embedding and LLM baselines, together with the emphasis on reproducibility via unmodified topic models, strengthens the contribution.
major comments (2)
- [§4 / Evaluation] The central evaluation (§4, abstract) uses the same annotated sample both to rank topics by keyword posterior probability and to compute the reported precision improvements. Because topic selection is guided by the evaluation labels, any lexical idiosyncrasies in the sample can inflate measured gains; no held-out validation set or cross-validation protocol for the selection step is described. This is load-bearing for the claim that the method “improve[s] sample precision.”
- [§3 / Methods] The precise mapping from document-topic posteriors to binary labels (e.g., threshold on the sum of selected topics, top-k rule, or probabilistic cutoff) is not stated explicitly enough to reproduce the classifier. This detail is required to interpret the hazard-dependent results and the comparisons in Tables 2–3.
minor comments (2)
- [Figure 1] Figure 1 caption and axis labels should clarify whether the plotted probabilities are topic-word or document-topic posteriors.
- [§3.2] The number of topics K and the hyper-parameter settings for each hazard-specific model are not tabulated; a single supplementary table would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. The two major comments highlight important points for clarity and rigor. We respond to each below and have revised the manuscript to address them where appropriate.
read point-by-point responses
-
Referee: [§4 / Evaluation] The central evaluation (§4, abstract) uses the same annotated sample both to rank topics by keyword posterior probability and to compute the reported precision improvements. Because topic selection is guided by the evaluation labels, any lexical idiosyncrasies in the sample can inflate measured gains; no held-out validation set or cross-validation protocol for the selection step is described. This is load-bearing for the claim that the method “improve[s] sample precision.”
Authors: We appreciate the referee raising this potential issue. However, topic ranking is performed solely by computing the posterior probability mass assigned to the fixed seed keywords within each topic-word distribution; these distributions are estimated from the full unlabeled corpus via standard LDA. The annotated sample is used exclusively after topic selection to evaluate document-level precision. No labels enter the ranking or selection step itself. We will revise §4 to state this separation explicitly, add a sentence confirming that topic selection remains fully unsupervised, and include a brief sensitivity check (e.g., varying the number of top topics) to demonstrate robustness. If the referee has a different reading of the selection procedure, we would welcome clarification. revision: partial
-
Referee: [§3 / Methods] The precise mapping from document-topic posteriors to binary labels (e.g., threshold on the sum of selected topics, top-k rule, or probabilistic cutoff) is not stated explicitly enough to reproduce the classifier. This detail is required to interpret the hazard-dependent results and the comparisons in Tables 2–3.
Authors: We agree that the exact decision rule for converting document-topic posteriors into binary relevance labels must be stated unambiguously. In the revised manuscript we will expand §3 with a dedicated paragraph (and pseudocode) specifying the rule used: a document is labeled relevant if the sum of its posterior probabilities over the top-ranked topics exceeds a fixed threshold (chosen once on the training corpus and held constant across hazards). We will also report the precise threshold value and confirm that the same rule is applied in all experiments, including the embedding and LLM baselines. revision: yes
Circularity Check
No circularity: empirical application with external annotations
full rationale
The paper applies unmodified topic models (LDA-style) to German news data and uses keyword posterior probabilities within topic-word distributions to select relevant topics for binary classification of climate hazards. An external annotated sample guides evaluation and demonstrates precision improvements, with results shown to be hazard-dependent. No mathematical derivations, fitted parameters renamed as predictions, self-citations of uniqueness theorems, or ansatzes smuggled via prior work are present. The central claims rest on empirical comparison to baselines (embedding classifier, LLM) rather than any self-referential reduction of outputs to inputs by construction. The method is self-contained against the provided annotations and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Topic models produce posterior distributions over topics that can be used directly for document selection without retraining or modification
Forward citations
Cited by 1 Pith paper
-
The Newsworthiness of Brazilian Distress: A Peak Analysis on Time Series of International Media Attention to Disasters in Brazil
The study applies time series peak detection to German news on Brazilian disasters to assess temporal alignment with actual disaster events in national and global databases.
Reference graph
Works this paper leans on
-
[1]
Introduction Assumewearegatheringnewsaboutfloodsevents to study collective attention in the media. Simply querying a news database to retrieve documents containing the stringfloodwould not only match newsreportingonactualfloods,butalsomanyfalse positives. Consider this (obviously constructed) ex- ample: “Soccer fans experienced a flood of emo- tionswitnes...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[2]
Related Literature Retrieval of environment-related documents Document retrieval is an ubiquitous step in creating corporaforsocio-environmentalresearch. Toname a few recent large-scale approaches, Leippold and Varini (2020) implemented a graph-based heuristic on Wikipedia metadata of entries on climate topics, Kong and Purves (2026) relied on climate-rel...
2020
-
[3]
Then, it describes the two deep learning strategies used for comparison
Methods This section formalises the task and explains how topic models are applied for binary classification. Then, it describes the two deep learning strategies used for comparison. 3.1. Task Formalisation Let D be a set of documentsd, each belonging to a binary classC = {0, 1}, and V be the set of all tokensw that appear inD. Class1represents relevant d...
2025
-
[4]
intruder
Data The data for this study derives from an ongoing projectonthecollectiveattentiontoextremeclimate events in the German media. Seven types of haz- ards were selected (cold waves, droughts, floods, heat waves, landslides, storms and wildfires). The wiso-net news aggregation database1 was queried using a pre-defined list of hazard-related key- words, simi...
2024
-
[5]
Therefore, each classification strategy was conducted for each type of extreme climate event separately
Experiments The varying estimated proportions of relevant doc- uments for each hazard sample suggest that these phenomena manifest differently not only in their nature but also in their coverage and linguistic fea- tures. Therefore, each classification strategy was conducted for each type of extreme climate event separately. The annotated sample was rando...
2024
-
[6]
in- cluding all extreme climate events
Results Aggregated resultsWe first examine results ag- gregated over the whole test split (n = 700), i.e. in- cluding all extreme climate events. Table 2 shows precision,recallandF1scoreforallclassifiers. The rightmost column shows the number of news arti- cles of typemain that were correctly identified as relevant. All classifiers succeeded in considerab...
-
[7]
Analysis In this section, we explore TM’s interpretability by providing more details on thetm-bmodels’ be- haviour. In our non-exhaustive hyperparameter dürre mitte trockenheit notstand niedrigwasser waldbrandgefahr ernteausfall tag feuchtigkeit vieh fisch kanton regenzeit wasser brandgefahr ausmaß stoff ernteausfall zentrum fischsterben nässe versicherun...
-
[8]
Existing disaster databases, for instance the EM-DAT (Delforge et al., 2025), are shaped by reporting practices and inclusion thresh- olds (e.g
General Discussion This work was primarily motivated by the lack of a comprehensive global database of extreme cli- mate disasters. Existing disaster databases, for instance the EM-DAT (Delforge et al., 2025), are shaped by reporting practices and inclusion thresh- olds (e.g. at least 10 fatalities), which have been widely discussed for their biased cover...
2025
-
[9]
Conclusion We have presented a comparative analysis of three binary classifiers for refining collections of news articles on extreme climate events retrieved via keyword-based approaches. Although the LLM and the fine-tuned text embeddings had a higher F1 score in general, the drop in comparison to TMs was 0.148 on the worst case (drought) but also only 0...
-
[10]
Bibliographical References Pedro Henrique Lima Alencar, Jan Sodoge, Eva Nora Paton, and Mariana Madruga De Brito
-
[11]
Aditya Anantharaman, Arpit Jadiya, Chandana Tu- lasiSaiSiri,BharathNVSAdikar,andBijuMohan
Flash droughts and their impacts—using newspaper articles to assess the perceived con- sequences of rapidly emerging droughts.Envi- ronmental Research Letters, 19(7):074048. Aditya Anantharaman, Arpit Jadiya, Chandana Tu- lasiSaiSiri,BharathNVSAdikar,andBijuMohan
-
[12]
In2019 3rd In- ternational Conference on Trends in Electronics and Informatics (ICOEI), pages 704–708
Performance evaluation of topic modeling algorithms for text classification. In2019 3rd In- ternational Conference on Trends in Electronics and Informatics (ICOEI), pages 704–708. Dimo Angelov and Diana Inkpen. 2024. Topic modeling: Contextual token embeddings are all you need. InFindings of the Association for Computational Linguistics: EMNLP 2024, pages...
2024
-
[13]
InProceedings of the 1st Workshop on Ecology, Environment, and Natural Language Process- ing (NLP4Ecology2025), pages 68–76, Tallinn, Estonia
Analyzing the online communication of environmental movement organizations: NLP approaches to topics, sentiment, and emotions. InProceedings of the 1st Workshop on Ecology, Environment, and Natural Language Process- ing (NLP4Ecology2025), pages 68–76, Tallinn, Estonia. University of Tartu Library. ValentinaTrettiBecklesandAdrianVergaraHeidke
-
[14]
InProceedings of the 1stWorkshoponEcology, Environment, andNat- ural Language Processing (NLP4Ecology2025), pages44–55,Tallinn,Estonia.UniversityofTartu Library
Thematic categorization on pineapple pro- duction in Costa Rica: An exploratory analysis through topic modeling. InProceedings of the 1stWorkshoponEcology, Environment, andNat- ural Language Processing (NLP4Ecology2025), pages44–55,Tallinn,Estonia.UniversityofTartu Library. David M. Blei, Andrew Y. Ng, and Michael I. Jordan
-
[15]
Erica Cai, Xi Chen, Reagan Grey Keeney, Ethan Zuckerman, Brendan O’Connor, and Przemys- law A
Latent dirichlet allocation.The Journal of Machine Learning Research, 3:993–1022. Erica Cai, Xi Chen, Reagan Grey Keeney, Ethan Zuckerman, Brendan O’Connor, and Przemys- law A. Grabowicz. 2025. Identifying and inves- tigating global news coverage of critical events such as disasters and terrorist attacks.Proceed- ings of the International AAAI Conference ...
2025
-
[16]
MelanieGall, KevinA.Borden, andSusanL.Cutter
Keyword-assisted topic models.American Journal of Political Science, 68(2):730–750. MelanieGall, KevinA.Borden, andSusanL.Cutter
-
[17]
When do losses count?: Six fallacies of natural hazards loss data.Bulletin of the Ameri- can Meteorological Society, 90(6):799 – 810. Ryan J. Gallagher, Kyle Reing, David Kale, and Greg Ver Steeg. 2017. Anchored correlation ex- planation: Topic modeling with minimal domain knowledge.Transactions of the Association for Computational Linguistics, 5:529–542....
2017
-
[18]
InNeurIPS 2020 Workshop on Tackling Climate Change with Machine Learning
Climatext: A dataset for climate change topic detection. InNeurIPS 2020 Workshop on Tackling Climate Change with Machine Learning. AlexandraLesnikowski,EllaBelfer,EmmaRodman, Julie Smith, Robbert Biesbroek, John D. Wilk- erson, James D. Ford, and Lea Berrang-Ford
2020
-
[19]
Chenliang Li, Jian Xing, Aixin Sun, and Zongyang Ma
Frontiersindataanalyticsforadaptationre- search: Topicmodeling.WIREsClimateChange, 10(3):e576. Chenliang Li, Jian Xing, Aixin Sun, and Zongyang Ma. 2016a. Effective document labeling with very few seed words: A topic model approach. InProceedings of the 25th ACM International on Conference on Information and Knowledge Management, CIKM ’16, page 85–94, New...
2025
-
[20]
Jon Mcauliffe and David Blei
Routledge. Jon Mcauliffe and David Blei. 2007. Supervised topic models. InAdvances in Neural Information Processing Systems, volume 20. Curran Asso- ciates, Inc. Timothy Miller, Dmitriy Dligach, and Guergana Savova. 2016. Unsupervised document classifi- cation with informed topic models. InProceed- ings of the 15th Workshop on Biomedical Natural Language ...
2007
-
[21]
Environmental Research Letters, 19(4):044066
Unveiling water allocation dynamics: a text analysis of 25 years of stakeholder meetings. Environmental Research Letters, 19(4):044066. Aytuğ Onan, Serdar Korukoğlu, and Hasan Bulut
-
[22]
Telma Peura, Attila Krizsán, Salla-Riikka Kuusalu, and Veronika Laippala
Ensemble of keyword extraction methods and classifiers in text classification.Expert Sys- tems with Applications, 57:232–247. Telma Peura, Attila Krizsán, Salla-Riikka Kuusalu, and Veronika Laippala. 2025. Perspectives on forests and forestry in Finnish online discussions - a topic modeling approach to suomi24. InPro- ceedings of the 1st Workshop on Ecolo...
2025
-
[23]
Topic modeling based classification of clinical reports. In51st Annual Meeting of the Association for Computational Linguistics Pro- ceedings of the Student Research Workshop, pages 67–73, Sofia, Bulgaria. Association for Computational Linguistics. Christopher Schröder, Lydia Müller, Andreas Niek- ler, and Martin Potthast. 2023. Small-text: Active learnin...
-
[24]
arXiv preprint arXiv:2209.11055 , year=
Framing climate change in nature and science editorials: applications of supervised and unsupervised text categorization.Journal of Computational Social Science, 6(2):485–513. Lewis Tunstall, Nils Reimers, Unso Eun Seo Jo, Luke Bates, Daniel Korat, Moshe Wasserblat, andOrenPereg.2022. Efficientfew-shotlearning without prompts. ArXiv preprint: 2209.11055. ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.