Danish Stance Classification and Rumour Resolution

Anders Edelbo Lillie; Emil Refsgaard Middelboe

arxiv: 1907.01304 · v1 · pith:MI56OE2Qnew · submitted 2019-07-02 · 💻 cs.CL · cs.SI

Danish Stance Classification and Rumour Resolution

Anders Edelbo Lillie , Emil Refsgaard Middelboe This is my paper

Pith reviewed 2026-05-25 11:17 UTC · model grok-4.3

classification 💻 cs.CL cs.SI

keywords stance classificationrumour verificationDanish language processinghidden Markov modelsupport vector machinesocial media analysisveracity predictionReddit dataset

0 comments

The pith

A linear SVM classifies Danish stances at 76 percent accuracy and feeds an HMM that predicts rumour veracity at 83 percent accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates a new stance-annotated dataset from Danish Reddit posts about rumours. It tests several models and finds that a linear support vector machine performs best at assigning stance labels. These labels are then used as input to a hidden Markov model that estimates whether a rumour is true or false. The system achieves strong results on Danish data and shows only a small loss when using automatically generated stance labels instead of manual ones. A sympathetic reader would care because this extends rumour verification methods to a new language and platform while demonstrating that the approach can run with less human effort.

Core claim

The paper generates a stance-annotated Danish Reddit dataset and shows that a Linear Support Vector Machine achieves the best stance classification results with an accuracy of 0.76 and macro F1 score of 0.42. It further shows that stance labels fed into a Hidden Markov Model can predict the veracity of rumours, reaching an accuracy of 0.83 and F1 of 0.68 when trained and tested on the Danish dataset alone. The model also works across languages and platforms, and using automatic stance labels causes only a small drop in performance.

What carries the argument

The linear support vector machine for stance classification combined with a hidden Markov model that treats sequences of stance labels as observations to infer rumour veracity.

If this is right

Stance classification transfers reasonably well from English Twitter data to Danish Reddit posts.
Rumour veracity can be estimated from stance labels alone using an HMM without needing other features.
Automatic stance classification is close enough to manual labels to support practical veracity prediction systems.
Performance improves when the HMM is trained on language-specific data rather than cross-lingual data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar stance-to-veracity pipelines could be built for other languages by creating small annotated datasets and reusing the HMM structure.
The fact that cross-platform transfer works suggests that stance patterns are somewhat universal across social media.
Future work could test whether adding temporal features or user metadata to the HMM would raise the veracity scores further.

Load-bearing premise

The stance annotations collected for the new Danish Reddit dataset are sufficiently accurate and representative that downstream HMM veracity predictions reflect genuine signal rather than annotation artifacts or domain mismatch.

What would settle it

Manually re-annotating a sample of the Danish Reddit posts and finding that the original stance labels disagree with the new annotations at a high rate would indicate that the reported HMM veracity accuracies may be inflated by annotation errors.

read the original abstract

The Internet is rife with flourishing rumours that spread through microblogs and social media. Recent work has shown that analysing the stance of the crowd towards a rumour is a good indicator for its veracity. One state-of-the-art system uses an LSTM neural network to automatically classify stance for posts on Twitter by considering the context of a whole branch, while another, more simple Decision Tree classifier, performs at least as well by performing careful feature engineering. One approach to predict the veracity of a rumour is to use stance as the only feature for a Hidden Markov Model (HMM). This thesis generates a stance-annotated Reddit dataset for the Danish language, and implements various models for stance classification. Out of these, a Linear Support Vector Machine provides the best results with an accuracy of 0.76 and macro F1 score of 0.42. Furthermore, experiments show that stance labels can be used across languages and platforms with a HMM to predict the veracity of rumours, achieving an accuracy of 0.82 and F1 score of 0.67. Even higher scores are achieved by relying only on the Danish dataset. In this case veracity prediction scores an accuracy of 0.83 and an F1 of 0.68. Finally, when using automatic stance labels for the HMM, only a small drop in performance is observed, showing that the implemented system can have practical applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

New Danish Reddit stance dataset plus HMM veracity results, but annotation quality and evaluation details are missing.

read the letter

The paper's main point is that they collected a new stance-annotated Danish Reddit corpus and used it to train a Linear SVM for stance classification (0.76 accuracy, 0.42 macro F1), then fed those labels into an HMM for rumour veracity prediction, reaching 0.83 accuracy and 0.68 F1 on the Danish data. They also test cross-language transfer from English Twitter data and show that automatic stance labels cause only a small drop in HMM performance. The dataset itself is the clearest addition, since Danish has little prior work in this area and Reddit is a different platform from the usual Twitter focus. The experiments follow existing pipelines without new algorithms, which keeps the claims grounded. The cross-language HMM results are a reasonable check on whether stance carries over. The main gap is that the abstract provides no inter-annotator agreement, annotation guidelines, label counts, or train/test split information. The modest macro F1 on stance classification could reflect class imbalance or label noise, which would weaken the downstream veracity numbers. No external baselines beyond their own model variants are mentioned either. If the full paper fills in those details and releases the data, the work becomes more usable. This is for people building multilingual or low-resource rumour detection systems who need non-English data points. The dataset could be worth citing if released cleanly. It deserves peer review because the new corpus is a concrete step forward even if the evaluation needs more transparency.

Referee Report

3 major / 2 minor

Summary. The paper introduces a new stance-annotated Danish Reddit dataset for rumours, evaluates multiple stance classification models (with Linear SVM achieving best results of 0.76 accuracy and 0.42 macro F1), and shows that gold or automatic stance labels can be fed into an HMM to predict rumour veracity, reaching 0.83 accuracy and 0.68 F1 on the Danish data (with slightly lower cross-lingual/platform results of 0.82/0.67).

Significance. If the stance annotations prove reliable, the work would be significant for demonstrating that stance-based veracity prediction via HMM transfers across languages (English to Danish) and platforms (Twitter to Reddit), and that automatic stance labels incur only a small performance drop, supporting practical deployment. The concrete performance numbers and the use of a simple, interpretable HMM are strengths.

major comments (3)

[Dataset creation section] Dataset creation section: No inter-annotator agreement, annotation guidelines, or label distribution statistics are reported for the new Danish Reddit stance dataset. This is load-bearing for the central claim because the HMM veracity accuracies (0.83/0.68) depend directly on the quality of these stance labels; the modest macro F1 of 0.42 on the SVM stance classifier is consistent with either severe imbalance or label noise that could make downstream results reflect annotation artifacts.
[Experimental results section] Experimental results section: No train/test split details, error bars, or statistical significance tests are provided for the reported accuracy and F1 scores on either stance classification or HMM veracity prediction. This undermines confidence in the headline numbers (0.76/0.42 for stance; 0.83/0.68 for veracity) and the claim that automatic stance yields only a small drop.
[Stance classification experiments] Stance classification experiments: Only internal model variants are compared; no external baselines from prior stance detection literature are included, making it impossible to assess whether the Linear SVM result represents a meaningful advance for Danish data.

minor comments (2)

[Abstract and results] The abstract and results text would benefit from explicit statements of the number of rumours/posts in the Danish dataset and the class distribution to contextualize the macro F1 scores.
[HMM veracity section] Notation for the HMM states and transition probabilities is introduced without a clear diagram or equation reference, making the veracity prediction pipeline harder to follow.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each of the major comments below and indicate the revisions we plan to make.

read point-by-point responses

Referee: Dataset creation section: No inter-annotator agreement, annotation guidelines, or label distribution statistics are reported for the new Danish Reddit stance dataset. This is load-bearing for the central claim because the HMM veracity accuracies (0.83/0.68) depend directly on the quality of these stance labels; the modest macro F1 of 0.42 on the SVM stance classifier is consistent with either severe imbalance or label noise that could make downstream results reflect annotation artifacts.

Authors: We agree that reporting inter-annotator agreement, annotation guidelines, and label distributions is essential. The dataset was annotated following adapted guidelines from English rumour stance datasets, but due to the scope of the thesis work, IAA was not calculated. We will add the label distribution and a summary of the guidelines to the dataset creation section in the revision. The high veracity prediction accuracy with gold labels supports that the annotations are of sufficient quality for this task, though we acknowledge the modest stance F1 may indicate class imbalance. revision: partial
Referee: Experimental results section: No train/test split details, error bars, or statistical significance tests are provided for the reported accuracy and F1 scores on either stance classification or HMM veracity prediction. This undermines confidence in the headline numbers (0.76/0.42 for stance; 0.83/0.68 for veracity) and the claim that automatic stance yields only a small drop.

Authors: We will revise the experimental results section to include explicit details on the train/test splits used. For error bars and significance tests, we will perform additional experiments with multiple random seeds to provide these statistics in the revised manuscript. revision: yes
Referee: Stance classification experiments: Only internal model variants are compared; no external baselines from prior stance detection literature are included, making it impossible to assess whether the Linear SVM result represents a meaningful advance for Danish data.

Authors: The primary goal was to establish baseline performance for the new Danish dataset by comparing several standard models internally. We will expand the related work section to reference prior stance detection approaches and discuss why direct comparison is challenging across languages, while noting that the SVM outperforms the other models tested on this data. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical results on held-out data

full rationale

The paper reports standard machine-learning performance numbers (SVM stance classification accuracy 0.76 / macro F1 0.42; HMM veracity prediction accuracy 0.83 / F1 0.68) obtained by training on one portion of the newly collected Danish Reddit dataset and evaluating on held-out data. No equations, fitted parameters, or self-citations reduce these test-set metrics to quantities defined on the same test instances. The stance labels serve as input features for the HMM; the reported veracity scores are not forced by construction from the stance-classifier training objective. No self-definitional, fitted-input-called-prediction, or self-citation-load-bearing steps appear.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claims rest on the existence and quality of a newly collected Danish stance dataset and on the assumption that standard SVM and HMM implementations can be applied without additional domain-specific modelling choices.

pith-pipeline@v0.9.0 · 5781 in / 1008 out tokens · 21153 ms · 2026-05-25T11:17:23.626206+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Linear Support Vector Machine provides the best results with an accuracy of 0.76 and macro F1 score of 0.42 on Danish stance classification; stance labels fed to an HMM yield veracity prediction accuracy 0.83 and F1 0.68
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Hidden Markov Model (HMM) ... stance as the only feature

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.