{"paper":{"title":"TeraGram: A Structured Longitudinal Dataset of the Telegram Messenger","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A dataset of 5.9 billion Telegram messages collected from 2015 to 2025 supplies raw data for examining social networks free of algorithmic curation.","cross_cats":[],"primary_cat":"physics.soc-ph","authors_text":"Anastasia Golovin, Andreas C. Schneider, Arne I. Gottwald, Joao Pinheiro Neto, Sebastian B. Mohr, Srushhti Trivedi, Ulrik Hvid, Viola Priesemann","submitted_at":"2026-05-15T13:50:07Z","abstract_excerpt":"Here we present a massive longitudinal dataset of public Telegram content, comprising over 5.9 billion messages dating from 2015 to 2025, collected from 712 thousand channels and groups, enriched with metadata on forwards, reactions, and polls. The dataset spans multiple languages including Russian and Farsi, representing countries where Telegram shows mainstream adoption, as well as Western languages where Telegram is used in specific sub-communities. The dataset has several advantages. First, when restricted by language, it provides a versatile example of an algorithm-free platform, contrary"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"The dataset offers a foundation for studying engagement patterns, network evolution, and community formation in the absence of algorithmic curation.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the collected public messages, when restricted by language, provide a representative and unbiased view of user behavior on an algorithm-free platform, as stated in the abstract's description of advantages.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"A large-scale longitudinal dataset of public Telegram content is introduced to enable studies of engagement patterns and network evolution without algorithmic curation.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A dataset of 5.9 billion Telegram messages collected from 2015 to 2025 supplies raw data for examining social networks free of algorithmic curation.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"4ec2e08bf2a79bf1e554ceb3799ab02223eb7f28e79a53a1b7a4094f0b0210a0"},"source":{"id":"2605.15956","kind":"arxiv","version":1},"verdict":{"id":"b4f71ab7-d42d-4c75-9856-97473433073a","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-19T19:14:56.919731Z","strongest_claim":"The dataset offers a foundation for studying engagement patterns, network evolution, and community formation in the absence of algorithmic curation.","one_line_summary":"A large-scale longitudinal dataset of public Telegram content is introduced to enable studies of engagement patterns and network evolution without algorithmic curation.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the collected public messages, when restricted by language, provide a representative and unbiased view of user behavior on an algorithm-free platform, as stated in the abstract's description of advantages.","pith_extraction_headline":"A dataset of 5.9 billion Telegram messages collected from 2015 to 2025 supplies raw data for examining social networks free of algorithmic curation."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.15956/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"doi_title_agreement","ran_at":"2026-05-19T19:31:19.033936Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T19:20:59.431974Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-19T17:33:44.877773Z","status":"skipped","version":"1.0.0","findings_count":0},{"name":"claim_evidence","ran_at":"2026-05-19T17:01:55.707184Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"1697b4857563bb2c1cc8e75d9142de2a98e08465cae7a6550b75d1da43ec467c"},"references":{"count":36,"sample":[{"doi":"","year":1961,"title":"Snowball sampling","work_id":"71fab8b0-1e13-45fe-99e2-4b41a3e41b78","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2016,"title":"Bag of Tricks for Eﬀicient Text Classification","work_id":"ebeb3969-9fdd-485a-b758-4a5fa65d1ba8","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2016,"title":"FastText.zip: Compressing text classification models","work_id":"cd1495d8-ba03-4095-b4d3-ea0d52c24350","ref_index":3,"cited_arxiv_id":"1612.03651","is_internal_anchor":true},{"doi":"","year":2016,"title":"The F AIR Guiding Prin- ciples for scientific data management and steward- ship","work_id":"0edc5fcd-b08f-486d-8d18-17557baec701","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2018,"title":"Examining Telegram Users’ Motivations, Technical Characteristics, Trust, Attitudes, and Positive Word- of-Mouth: Evidence from Iran","work_id":"5268e0df-49b6-4d4e-b068-7f2a6c250d4b","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":36,"snapshot_sha256":"0c6593021c07cbf4c643483228cf4a640d67fa4aa808bed3581ba44c460679e1","internal_anchors":2},"formal_canon":{"evidence_count":2,"snapshot_sha256":"97a6dc222d2414a0ed69eecfd2aeff319df4d58ebc9c87cde8ac9cf75ccab3aa"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}