{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:IL4K7Q3LHKDUN2HRFHEULT5MHS","short_pith_number":"pith:IL4K7Q3L","schema_version":"1.0","canonical_sha256":"42f8afc36b3a8746e8f129c945cfac3c8313fa05e9db95db010e29cf57feb330","source":{"kind":"arxiv","id":"2605.15571","version":1},"attestation_state":"computed","paper":{"title":"MaxSketch: Robust Distinct Counting in Streams via Random Projections","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A max-linear sketch over random Gaussian projections recovers distinct counts in noisy streams using only logarithmic memory when observations share geometric structure.","cross_cats":["cs.LG"],"primary_cat":"stat.ML","authors_text":"Christos Tzamos, Constantine Caramanis, Nikos Tsikouras","submitted_at":"2026-05-15T03:29:26Z","abstract_excerpt":"Estimating the number of distinct elements in a data stream is well understood when repeated elements are identical. In modern settings, however, observations are high-dimensional and noisy, so repeated instances of the same object are only approximately similar -- for example, different images of the same individual may vary significantly at the pixel level. Classical sketches such as HyperLogLog rely on consistent hash values for identical elements and break down in this regime. Recent work on robust distinct counting in general metric spaces achieves $\\widetilde{\\Theta}(\\sqrt{n})$ memory, w"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":false},"canonical_record":{"source":{"id":"2605.15571","kind":"arxiv","version":1},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"stat.ML","submitted_at":"2026-05-15T03:29:26Z","cross_cats_sorted":["cs.LG"],"title_canon_sha256":"b63376785046f6f7ce9bfa9f0e671ae93fc199a02c24b4baf9e52683117c4795","abstract_canon_sha256":"2483a899c2b823513a4580a0b51b4e125059fb09a6c0a0034e4af8c0cbc615fc"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-20T00:01:05.974939Z","signature_b64":"3Wj5XTC8CXQU/LiMd+rYbUwrt42xHq4l+NU/jae3KbKVTbyBzjpF6qX41+ztlBpIbXOGOzmLHFlI/T+xncDSDQ==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"42f8afc36b3a8746e8f129c945cfac3c8313fa05e9db95db010e29cf57feb330","last_reissued_at":"2026-05-20T00:01:05.974067Z","signature_status":"signed_v1","first_computed_at":"2026-05-20T00:01:05.974067Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"MaxSketch: Robust Distinct Counting in Streams via Random Projections","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A max-linear sketch over random Gaussian projections recovers distinct counts in noisy streams using only logarithmic memory when observations share geometric structure.","cross_cats":["cs.LG"],"primary_cat":"stat.ML","authors_text":"Christos Tzamos, Constantine Caramanis, Nikos Tsikouras","submitted_at":"2026-05-15T03:29:26Z","abstract_excerpt":"Estimating the number of distinct elements in a data stream is well understood when repeated elements are identical. In modern settings, however, observations are high-dimensional and noisy, so repeated instances of the same object are only approximately similar -- for example, different images of the same individual may vary significantly at the pixel level. Classical sketches such as HyperLogLog rely on consistent hash values for identical elements and break down in this regime. Recent work on robust distinct counting in general metric spaces achieves $\\widetilde{\\Theta}(\\sqrt{n})$ memory, w"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"We show that under this assumption m = O~(log n / ε²) random projections (and hence O~(log n/ε²) memory) suffice to recover the true distinct count within a (1+ε) factor.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The input observations possess geometric structure common in learned representations that permits the max-linear sketch over random Gaussian projections to separate latent objects at the claimed memory cost.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"MaxSketch achieves O~(log n / ε²) memory for (1+ε)-approximate distinct counting in streams with geometric structure via max-linear random projections.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A max-linear sketch over random Gaussian projections recovers distinct counts in noisy streams using only logarithmic memory when observations share geometric structure.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"aeed15970ad2cb0e410f55a5873e248b892b1f8a3be7575b6712339011cba5f1"},"source":{"id":"2605.15571","kind":"arxiv","version":1},"verdict":{"id":"02cf9ebf-d381-4d6a-9f3f-4d99f721bb07","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-19T20:04:45.435299Z","strongest_claim":"We show that under this assumption m = O~(log n / ε²) random projections (and hence O~(log n/ε²) memory) suffice to recover the true distinct count within a (1+ε) factor.","one_line_summary":"MaxSketch achieves O~(log n / ε²) memory for (1+ε)-approximate distinct counting in streams with geometric structure via max-linear random projections.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The input observations possess geometric structure common in learned representations that permits the max-linear sketch over random Gaussian projections to separate latent objects at the claimed memory cost.","pith_extraction_headline":"A max-linear sketch over random Gaussian projections recovers distinct counts in noisy streams using only logarithmic memory when observations share geometric structure."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.15571/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"doi_title_agreement","ran_at":"2026-05-19T20:31:19.246651Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T20:11:04.351058Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-19T19:34:35.253955Z","status":"skipped","version":"1.0.0","findings_count":0},{"name":"claim_evidence","ran_at":"2026-05-19T17:41:56.078479Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"e44b4985a5c9bfef6d476cfeed896ea4a8454ef4bf4f9fb5ec62d8293c288eef"},"references":{"count":34,"sample":[{"doi":"","year":2003,"title":"Concentration inequalities","work_id":"ae8a68c8-e878-41c6-b84a-1fe2572b63c0","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2016,"title":"Streaming algorithms for robust distinct elements","work_id":"82349bc4-8955-4209-93e6-bfe723d7019e","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2018,"title":"Distinct sampling on streaming data with near-duplicates","work_id":"9a8e72af-2801-49bf-99e5-e70b9411ee34","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2020,"title":"A simple framework for contrastive learning of visual representations","work_id":"ef2113fc-d254-4c64-be2f-803754588eeb","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2003,"title":"Loglog counting of large cardinalities","work_id":"09cf4338-d192-4af8-9306-2ea48442b566","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":34,"snapshot_sha256":"3b300ea0791f020f0ed959f40ba80442b02591dc436b7bc457493c7fc5cb00f0","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2605.15571","created_at":"2026-05-20T00:01:05.974203+00:00"},{"alias_kind":"arxiv_version","alias_value":"2605.15571v1","created_at":"2026-05-20T00:01:05.974203+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2605.15571","created_at":"2026-05-20T00:01:05.974203+00:00"},{"alias_kind":"pith_short_12","alias_value":"IL4K7Q3LHKDU","created_at":"2026-05-20T00:01:05.974203+00:00"},{"alias_kind":"pith_short_16","alias_value":"IL4K7Q3LHKDUN2HR","created_at":"2026-05-20T00:01:05.974203+00:00"},{"alias_kind":"pith_short_8","alias_value":"IL4K7Q3L","created_at":"2026-05-20T00:01:05.974203+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":0,"internal_anchor_count":0,"sample":[]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/IL4K7Q3LHKDUN2HRFHEULT5MHS","json":"https://pith.science/pith/IL4K7Q3LHKDUN2HRFHEULT5MHS.json","graph_json":"https://pith.science/api/pith-number/IL4K7Q3LHKDUN2HRFHEULT5MHS/graph.json","events_json":"https://pith.science/api/pith-number/IL4K7Q3LHKDUN2HRFHEULT5MHS/events.json","paper":"https://pith.science/paper/IL4K7Q3L"},"agent_actions":{"view_html":"https://pith.science/pith/IL4K7Q3LHKDUN2HRFHEULT5MHS","download_json":"https://pith.science/pith/IL4K7Q3LHKDUN2HRFHEULT5MHS.json","view_paper":"https://pith.science/paper/IL4K7Q3L","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2605.15571&json=true","fetch_graph":"https://pith.science/api/pith-number/IL4K7Q3LHKDUN2HRFHEULT5MHS/graph.json","fetch_events":"https://pith.science/api/pith-number/IL4K7Q3LHKDUN2HRFHEULT5MHS/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/IL4K7Q3LHKDUN2HRFHEULT5MHS/action/timestamp_anchor","attest_storage":"https://pith.science/pith/IL4K7Q3LHKDUN2HRFHEULT5MHS/action/storage_attestation","attest_author":"https://pith.science/pith/IL4K7Q3LHKDUN2HRFHEULT5MHS/action/author_attestation","sign_citation":"https://pith.science/pith/IL4K7Q3LHKDUN2HRFHEULT5MHS/action/citation_signature","submit_replication":"https://pith.science/pith/IL4K7Q3LHKDUN2HRFHEULT5MHS/action/replication_record"}},"created_at":"2026-05-20T00:01:05.974203+00:00","updated_at":"2026-05-20T00:01:05.974203+00:00"}