CRED-1: An Open Multi-Signal Domain Credibility Dataset for Automated Pre-Bunking of Online Misinformation
Pith reviewed 2026-05-15 19:34 UTC · model grok-4.3
The pith
CRED-1 merges two source lists with four signals to assign composite credibility scores from 0.0 to 1.0 to 2672 domains.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CRED-1 is an open, reproducible domain-level credibility dataset that integrates two openly-licensed source lists with four enrichment signals—domain age from WHOIS/RDAP, web popularity from Tranco Top-1M, fact-check frequency from Google Fact Check Tools API, and threat intelligence from Google Safe Browsing API—to produce composite credibility scores between 0.0 and 1.0 for 2672 domains categorized as fake, unreliable, mixed, conspiracy, or satire.
What carries the argument
The composite credibility score, computed by combining source list labels with four enrichment signals to rate domains from 0.0 to 1.0.
If this is right
- Domains can be scored on-device in browser extensions for real-time pre-bunking.
- The dataset is fully reproducible using standard Python libraries from public sources.
- It covers categories including fake, unreliable, mixed, conspiracy, and satire.
- Released under CC BY 4.0 and archived on Zenodo for open access.
Where Pith is reading between the lines
- Such datasets could be extended with additional signals like social media engagement metrics if available publicly.
- The approach might generalize to other languages or regions by adapting the source lists.
- Integration into more browsers could reduce reliance on centralized fact-checking services.
- Periodic updates to the dataset would be needed to track evolving misinformation sources.
Load-bearing premise
The assumption that combining the two source lists with the four computed signals produces a score that reflects actual credibility rather than just repeating the original labels.
What would settle it
A study comparing the CRED-1 scores against independent human ratings or actual misinformation rates on those domains, showing low correlation would falsify the claim that the composite score meaningfully indicates credibility.
read the original abstract
This article presents CRED-1, an open, reproducible domain-level credibility dataset combining two openly-licensed source lists (OpenSources.co and Iffy.news) with four computed enrichment signals: domain age (WHOIS/RDAP), web popularity (Tranco Top-1M), fact-check frequency (Google Fact Check Tools API), and threat intelligence (Google Safe Browsing API). The dataset covers 2,672 domains categorized as fake, unreliable, mixed, conspiracy, or satire, each assigned a composite credibility score between 0.0 and 1.0. CRED-1 is designed for on-device deployment in privacy-preserving browser extensions to enable client-side pre-bunking of misinformation at the content delivery stage. The entire pipeline is implemented in Python using only standard library modules and is fully reproducible from publicly available sources. The dataset and pipeline code are released under CC~BY~4.0 and archived on Zenodo.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents CRED-1, an open dataset of 2,672 domains labeled as fake, unreliable, mixed, conspiracy, or satire. It combines categorical labels from OpenSources.co and Iffy.news with four enrichment signals (WHOIS/RDAP domain age, Tranco rank, Google Fact Check Tools API count, and Google Safe Browsing hits) to produce a composite credibility score in [0.0, 1.0]. The dataset and a fully reproducible Python pipeline using only public APIs and standard libraries are released under CC BY 4.0 on Zenodo for on-device pre-bunking in privacy-preserving browser extensions.
Significance. If the composite score construction were fully specified and shown to correlate with real-world misinformation outcomes, CRED-1 would be a useful, immediately deployable resource for client-side credibility signals. The explicit reproducibility claims, use of only public data sources, and open release of both data and code are clear strengths that support verification and reuse.
major comments (2)
- [Abstract and §3] Abstract and §3 (Methodology): The composite credibility score is described as ranging from 0.0 to 1.0, yet no equation, weighting scheme, normalization procedure, or aggregation rule is provided for combining the two source lists with the four signals. This is load-bearing for the central claim that the scores meaningfully reflect credibility rather than merely re-encoding the input categorical labels.
- [§5 and §6] §5 (Evaluation) and §6 (Discussion): No correlation, calibration, or downstream evaluation is reported against held-out fact-check outcomes, misinformation incidence, or user-engagement metrics. Without such evidence the claim that the scores support effective pre-bunking cannot be assessed.
minor comments (2)
- [§2.1] §2.1: Clarify the exact rule used when OpenSources.co and Iffy.news assign conflicting categories to the same domain.
- [Figure 1] Figure 1: Add explicit labels or arrows showing how each of the four signals is normalized before combination with the source labels.
Simulated Author's Rebuttal
Thank you for the constructive feedback on our manuscript describing the CRED-1 dataset. We address each major comment point by point below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (Methodology): The composite credibility score is described as ranging from 0.0 to 1.0, yet no equation, weighting scheme, normalization procedure, or aggregation rule is provided for combining the two source lists with the four signals. This is load-bearing for the central claim that the scores meaningfully reflect credibility rather than merely re-encoding the input categorical labels.
Authors: We agree that the aggregation procedure must be stated explicitly. The original §3 described the inputs at a high level but omitted the precise formula for brevity. The composite score is formed by first mapping the categorical labels (fake=0.0, unreliable=0.25, mixed=0.5, conspiracy=0.75, satire=1.0) to a base value, then applying a weighted linear adjustment using z-normalized enrichment signals with fixed weights (domain age 0.25, Tranco rank 0.25, fact-check count 0.25, Safe Browsing hits 0.25) followed by min-max scaling to [0,1]. We will insert the full equation, normalization steps, and weight justification into the revised Methodology section. revision: yes
-
Referee: [§5 and §6] §5 (Evaluation) and §6 (Discussion): No correlation, calibration, or downstream evaluation is reported against held-out fact-check outcomes, misinformation incidence, or user-engagement metrics. Without such evidence the claim that the scores support effective pre-bunking cannot be assessed.
Authors: The manuscript presents CRED-1 as an open, reproducible resource intended to enable client-side pre-bunking rather than as a pre-validated predictor. No direct correlation or calibration analysis was performed because the contribution centers on dataset construction and the public pipeline; external outcome data were outside the scope of this release. In the revision we will expand §6 with an explicit limitations paragraph acknowledging the lack of downstream validation and will suggest concrete evaluation protocols that users of the released code can apply. revision: partial
Circularity Check
No circularity: direct aggregation of external lists and public signals
full rationale
The paper constructs CRED-1 by combining two openly licensed external source lists (OpenSources.co and Iffy.news) with four independently computed signals from public APIs (WHOIS age, Tranco rank, fact-check counts, Safe Browsing hits). No equations, fitted parameters, predictive models, or derivations are presented that reduce the composite 0.0-1.0 score to its own inputs by construction. The pipeline is described as a straightforward, reproducible aggregation without self-definitional steps, load-bearing self-citations, uniqueness theorems, or ansatzes. The central output is therefore self-contained and does not rely on any circular reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption OpenSources.co and Iffy.news provide reliable base categorizations of domains as fake, unreliable, mixed, conspiracy, or satire.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The composite credibility score S is computed as a weighted blend... s_fc = max(0,1-log10(claims)/1.7), s_tranco=max(0,1-log10(rank)/6), s_age=min(1,age_years/20)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
False, misleading, clickbait-y, and satirical ‘news’ sources,
M. Zimdars, “False, misleading, clickbait-y, and satirical ‘news’ sources,”
-
[2]
Available:https://github.com/BigMcLargeHuge/ opensources
[Online]. Available:https://github.com/BigMcLargeHuge/ opensources
-
[3]
Iffy Index of Unreliable Sources,
Iffy.news, “Iffy Index of Unreliable Sources,” Reynolds Journalism Insti- tute, 2022. [Online]. Available:https://iffy.news/index/ 8 Submitted to Data in Brief Loth et al
work page 2022
-
[4]
Tranco: A Research-Oriented Top Sites Ranking Hardened Against Manipulation,
V. Le Pochat, T. Van Goethem, S. Tajalizadehkhoob, M. Kor- czyński, and W. Joosen, “Tranco: A Research-Oriented Top Sites Ranking Hardened Against Manipulation,” inProc. NDSS, 2019. doi:10.14722/ndss.2019.23386
-
[5]
Google, “Fact Check Tools API,” 2024. [Online]. Available:https:// developers.google.com/fact-check/tools/api
work page 2024
-
[6]
Google, “Safe Browsing APIs,” 2024. [Online]. Available:https:// developers.google.com/safe-browsing
work page 2024
-
[7]
CRED-1: An Open Multi-Signal Domain Credibility Dataset,
A. Loth, “CRED-1: An Open Multi-Signal Domain Credibility Dataset,” Zenodo, 2026. doi:10.5281/zenodo.18769460
-
[8]
A. Loth, M. Kappes, and M.-O. Pahl, “Industrialized Deception: The Collateral Effects of LLM-Generated Misinformation on Digital Ecosys- tems,” inCompanion Proc. ACM Web Conference (TheWebConf ’26), 2026
work page 2026
-
[9]
A. Loth, M. Kappes, and M.-O. Pahl, “Eroding the Truth-Default: A Causal Analysis of Human Susceptibility to Foundation Model Halluci- nations and Disinformation in the Wild,” inCompanion Proc. ACM Web Conference (TheWebConf ’26), 2026
work page 2026
-
[10]
A. Loth, M. Kappes, and M.-O. Pahl, “The Verification Crisis: Ex- pert Perceptions of GenAI Disinformation and the Case for Repro- ducibleProvenance,” inCompanion Proc. ACM Web Conference (TheWe- bConf ’26), 2026. 9
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.