{"paper":{"title":"Contrastive Learning under Noisy Temporal Self-Supervision for Colonoscopy Videos","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Temporal structure in colonoscopy videos supplies enough signal for a noise-aware contrastive loss to learn polyp representations that outperform prior methods on retrieval, re-identification, size estimation, and histology tasks.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Carlo Biffi, Lamberto Ballan, Loic Le Folgoc, Luca Parolari, Pietro Gori","submitted_at":"2026-05-12T16:04:42Z","abstract_excerpt":"Learning robust representations of polyp tracklets is key to enabling multiple AI-assisted colonoscopy applications, from polyp characterization to automated reporting and retrieval. Supervised contrastive learning is an effective approach for learning such representations, but it typically relies on correct positive and negative definitions. Collecting these labels requires linking tracklets that depict the same underlying polyp entity throughout the video, which is costly and demands specialized clinical expertise. In this work, we leverage the sequential workflow of colonoscopy procedures t"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Our method outperforms prior self-supervised and supervised baselines, and matches or exceeds recent foundation models across all tasks, using a lightweight encoder trained on only 27 videos.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the temporally derived associations, even when noisy, contain enough correct signal for the noise-aware loss to produce representations that generalize to the downstream clinical tasks; the abstract does not detail how noise levels were measured or controlled.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"A noise-aware contrastive loss built on temporal self-supervision learns polyp tracklet representations from 27 videos that outperform prior self-supervised and supervised baselines and match foundation models on retrieval, re-identification, size estimation, and histology classification.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Temporal structure in colonoscopy videos supplies enough signal for a noise-aware contrastive loss to learn polyp representations that outperform prior methods on retrieval, re-identification, size estimation, and histology tasks.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"ef08298ded14df7f70a37eb40d5930da0678fe14881c289bb641c48570020614"},"source":{"id":"2605.12320","kind":"arxiv","version":2},"verdict":{"id":"cdc80de5-8510-48e1-93c0-1180f129ab4e","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-13T05:19:37.622163Z","strongest_claim":"Our method outperforms prior self-supervised and supervised baselines, and matches or exceeds recent foundation models across all tasks, using a lightweight encoder trained on only 27 videos.","one_line_summary":"A noise-aware contrastive loss built on temporal self-supervision learns polyp tracklet representations from 27 videos that outperform prior self-supervised and supervised baselines and match foundation models on retrieval, re-identification, size estimation, and histology classification.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the temporally derived associations, even when noisy, contain enough correct signal for the noise-aware loss to produce representations that generalize to the downstream clinical tasks; the abstract does not detail how noise levels were measured or controlled.","pith_extraction_headline":"Temporal structure in colonoscopy videos supplies enough signal for a noise-aware contrastive loss to learn polyp representations that outperform prior methods on retrieval, re-identification, size estimation, and histology tasks."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.12320/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"claim_evidence","ran_at":"2026-05-19T22:41:58.276270Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-19T10:39:14.876235Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_title_agreement","ran_at":"2026-05-19T08:31:16.505954Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T07:40:33.478780Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"8368c0eff03afcd3686e2bc55c9e0b3dc5382ee43bcb42146524ad4bb25f9383"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"e42bc200a1c2efd4dd6b99ced88614871a647d8e7614a07c0ca2c0e7874851f2"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}