{"state_type":"pith_open_graph_state","state_version":"1.0","pith_number":"pith:2026:7IQ5ROQXRFVW6A5AMGZSLKSC22","merge_version":"pith-open-graph-merge-v1","event_count":2,"valid_event_count":2,"invalid_event_count":0,"equivocation_count":0,"current":{"canonical_record":{"metadata":{"abstract_canon_sha256":"efec67830e03f908c7a4897f44fc9795632c2e6b3be7c3444d7fbc8a37f19c96","cross_cats_sorted":["cs.LG"],"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cond-mat.stat-mech","submitted_at":"2026-05-13T13:37:31Z","title_canon_sha256":"8e1a924b6e7a85906999aa77cc2050a2cd2ed21cc268a2da32e34da389b87c9a"},"schema_version":"1.0","source":{"id":"2605.13520","kind":"arxiv","version":1}},"source_aliases":[{"alias_kind":"arxiv","alias_value":"2605.13520","created_at":"2026-05-18T02:44:24Z"},{"alias_kind":"arxiv_version","alias_value":"2605.13520v1","created_at":"2026-05-18T02:44:24Z"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2605.13520","created_at":"2026-05-18T02:44:24Z"},{"alias_kind":"pith_short_12","alias_value":"7IQ5ROQXRFVW","created_at":"2026-05-18T12:33:37Z"},{"alias_kind":"pith_short_16","alias_value":"7IQ5ROQXRFVW6A5A","created_at":"2026-05-18T12:33:37Z"},{"alias_kind":"pith_short_8","alias_value":"7IQ5ROQX","created_at":"2026-05-18T12:33:37Z"}],"graph_snapshots":[{"event_id":"sha256:e529568246345b3a37c1beff0c20a67f1fc129b381331c503dc5e839cf6c334d","target":"graph","created_at":"2026-05-18T02:44:24Z","signer":{"key_id":"pith-v1-2026-05","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","signer_id":"pith.science","signer_type":"pith_registry"},"payload":{"graph_snapshot":{"author_claims":{"count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","strong_count":0},"builder_version":"pith-number-builder-2026-05-17-v1","claims":{"count":4,"items":[{"attestation":"unclaimed","claim_id":"C1","kind":"strongest_claim","source":"verdict.strongest_claim","status":"machine_extracted","text":"our analysis based on t-SNE and persistent homology (PH) reveals a ring-like structure with no evident clustering and intrinsic dimensionality equal to one. We further propose a generative probabilistic-geometric model in which the data are sampled uniformly from a unit circle. Under this model, pairwise cosine distances follow an arcsine distribution, in qualitative agreement with the observed U-shaped distribution"},{"attestation":"unclaimed","claim_id":"C2","kind":"weakest_assumption","source":"verdict.weakest_assumption","status":"machine_extracted","text":"That the proposed generative model of uniform sampling from a unit circle is the appropriate description of the data-generating process and that qualitative agreement between the arcsine distribution and the observed distances constitutes independent support rather than post-hoc fitting."},{"attestation":"unclaimed","claim_id":"C3","kind":"one_line_summary","source":"verdict.one_line_summary","status":"machine_extracted","text":"PCA suggested clustering in fossil teeth data on a nonlinear manifold, but t-SNE and persistent homology show a ring structure with no clustering, supported by a unit-circle generative model whose arcsine distance distribution matches observations qualitatively."},{"attestation":"unclaimed","claim_id":"C4","kind":"headline","source":"verdict.pith_extraction.headline","status":"machine_extracted","text":"PCA scatterplots can falsely suggest clusters in data that actually form a simple ring with no clusters."}],"snapshot_sha256":"b53f6f1f565ce41ceb2bd4181808f1c54943853d84ad6ee93d3a7dc6bb9167ec"},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"paper":{"abstract_excerpt":"We address shortcomings of principal component analysis (PCA) for visualizing high-dimensional data lying on a nonlinear low-dimensional manifold via two-dimensional scatterplots, focusing on a fossil teeth dataset from the early mammalian insectivore Kuehneotherium. While the PCA scatterplot reported by Jolliffe and Cadima (Philosophical Transactions of the Royal Society A, 2016) shows clustering in the region where PC2 < 0, our analysis based on t-SNE and persistent homology (PH) reveals a ring-like structure with no evident clustering and intrinsic dimensionality equal to one. We further pr","authors_text":"Gionni Marchetti","cross_cats":["cs.LG"],"headline":"PCA scatterplots can falsely suggest clusters in data that actually form a simple ring with no clusters.","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cond-mat.stat-mech","submitted_at":"2026-05-13T13:37:31Z","title":"Beyond Explained Variance: A Cautionary Tale of PCA"},"references":{"count":57,"internal_anchors":1,"resolved_work":57,"sample":[{"cited_arxiv_id":"","doi":"","is_internal_anchor":false,"ref_index":1,"title":"To this end, various heuristics exist","work_id":"8b806f1c-6a30-4316-8ae8-adf7869b3b4a","year":2004},{"cited_arxiv_id":"","doi":"","is_internal_anchor":false,"ref_index":2,"title":"Pearson, Philosophical Magazine Series 12, 559 (1901)","work_id":"30623c58-6cb6-4505-88de-416996bbf166","year":1901},{"cited_arxiv_id":"","doi":"","is_internal_anchor":false,"ref_index":3,"title":"Hotelling, Journal of Educational Psychology24, 498 (1933)","work_id":"9861ba80-22fb-4ed6-8c56-6217bb027be9","year":1933},{"cited_arxiv_id":"","doi":"","is_internal_anchor":false,"ref_index":4,"title":"I. T. Jolliffe,Principal Component Analysis, Springer Series in Statistics (Springer, New York, NY, 2002), 2nd ed., ISBN 978-0-387-95442-4, springer Science+Business Media New York; eBook ISBN: 978-0-","work_id":"09d2b4ee-2769-4b2b-afcb-ca53878efc64","year":2002},{"cited_arxiv_id":"1404.1100","doi":"","is_internal_anchor":true,"ref_index":5,"title":"Shlens , title =","work_id":"81873ae0-02af-4fda-b7dd-b225d63dc0a7","year":2014}],"snapshot_sha256":"ad7d19f99beeb790a9fe8ee91414569f9fdb5b327357375d6bda0cdd8ad14ba0"},"source":{"id":"2605.13520","kind":"arxiv","version":1},"verdict":{"created_at":"2026-05-14T18:25:17.811249Z","id":"b2a0ab83-4793-471d-b778-5f6414630aa4","model_set":{"reader":"grok-4.3"},"one_line_summary":"PCA suggested clustering in fossil teeth data on a nonlinear manifold, but t-SNE and persistent homology show a ring structure with no clustering, supported by a unit-circle generative model whose arcsine distance distribution matches observations qualitatively.","pipeline_version":"pith-pipeline@v0.9.0","pith_extraction_headline":"PCA scatterplots can falsely suggest clusters in data that actually form a simple ring with no clusters.","strongest_claim":"our analysis based on t-SNE and persistent homology (PH) reveals a ring-like structure with no evident clustering and intrinsic dimensionality equal to one. We further propose a generative probabilistic-geometric model in which the data are sampled uniformly from a unit circle. Under this model, pairwise cosine distances follow an arcsine distribution, in qualitative agreement with the observed U-shaped distribution","weakest_assumption":"That the proposed generative model of uniform sampling from a unit circle is the appropriate description of the data-generating process and that qualitative agreement between the arcsine distribution and the observed distances constitutes independent support rather than post-hoc fitting."}},"verdict_id":"b2a0ab83-4793-471d-b778-5f6414630aa4"}}],"author_attestations":[],"timestamp_anchors":[],"storage_attestations":[],"citation_signatures":[],"replication_records":[],"corrections":[],"mirror_hints":[],"record_created":{"event_id":"sha256:e22ec1996637dc41c28e1aee49f55a33c98fbaeea704d318e5f0911d9008b162","target":"record","created_at":"2026-05-18T02:44:24Z","signer":{"key_id":"pith-v1-2026-05","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","signer_id":"pith.science","signer_type":"pith_registry"},"payload":{"attestation_state":"computed","canonical_record":{"metadata":{"abstract_canon_sha256":"efec67830e03f908c7a4897f44fc9795632c2e6b3be7c3444d7fbc8a37f19c96","cross_cats_sorted":["cs.LG"],"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cond-mat.stat-mech","submitted_at":"2026-05-13T13:37:31Z","title_canon_sha256":"8e1a924b6e7a85906999aa77cc2050a2cd2ed21cc268a2da32e34da389b87c9a"},"schema_version":"1.0","source":{"id":"2605.13520","kind":"arxiv","version":1}},"canonical_sha256":"fa21d8ba17896b6f03a061b325aa42d6828c1fdf930e00996c515cb0883d070f","receipt":{"algorithm":"ed25519","builder_version":"pith-number-builder-2026-05-17-v1","canonical_sha256":"fa21d8ba17896b6f03a061b325aa42d6828c1fdf930e00996c515cb0883d070f","first_computed_at":"2026-05-18T02:44:24.395181Z","key_id":"pith-v1-2026-05","kind":"pith_receipt","last_reissued_at":"2026-05-18T02:44:24.395181Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","receipt_version":"0.3","signature_b64":"dTg4UYqQ0CGdJJ7V6y2GnCxzpNR/i8wdKobEd1X5WWJNyf0xO5HVgZuxX7y2G4QCyyvynzYgwQ7EMLbQKmDVDQ==","signature_status":"signed_v1","signed_at":"2026-05-18T02:44:24.395630Z","signed_message":"canonical_sha256_bytes"},"source_id":"2605.13520","source_kind":"arxiv","source_version":1}}},"equivocations":[],"invalid_events":[],"applied_event_ids":["sha256:e22ec1996637dc41c28e1aee49f55a33c98fbaeea704d318e5f0911d9008b162","sha256:e529568246345b3a37c1beff0c20a67f1fc129b381331c503dc5e839cf6c334d"],"state_sha256":"6b8a35c05347dcc90fbda556582101dc08b21f5d97356b12f356a13652d8a502"}