{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:A2CQIYS2XN45L2QU7VB6BF4IDK","short_pith_number":"pith:A2CQIYS2","schema_version":"1.0","canonical_sha256":"068504625abb79d5ea14fd43e097881aa71107d568124dc8bd4b852ab359651e","source":{"kind":"arxiv","id":"2605.13511","version":1},"attestation_state":"computed","paper":{"title":"Many-Shot CoT-ICL: Making In-Context Learning Truly Learn","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Many-shot chain-of-thought in-context learning behaves as test-time learning when demonstrations are ordered for smooth conceptual progression.","cross_cats":["cs.AI"],"primary_cat":"cs.CL","authors_text":"Dit-Yan Yeung, Lemao Liu, Mo Yu, Tsz Ting Chung","submitted_at":"2026-05-13T13:30:12Z","abstract_excerpt":"In-context learning (ICL) adapts large language models (LLMs) to new tasks by conditioning on demonstrations in the prompt without parameter updates. With long-context models, many-shot ICL can use dozens to hundreds of examples and achieve performance comparable to fine-tuning, yet current understanding of its scaling behavior is largely derived from non-reasoning tasks. We study many-shot chain-of-thought in-context learning (CoT-ICL) for reasoning and show that standard many-shot rules do not transfer. Across non-reasoning and reasoning-oriented LLMs and across non-reasoning and reasoning t"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":false},"canonical_record":{"source":{"id":"2605.13511","kind":"arxiv","version":1},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.CL","submitted_at":"2026-05-13T13:30:12Z","cross_cats_sorted":["cs.AI"],"title_canon_sha256":"2283beec5815ca27fdd347c93e6e5ed8ee9085a0c0c5643a150bc5da89cebf07","abstract_canon_sha256":"7828fb637367750ebde2ded28567528f32c3e172fbc430ffd398bd3311476826"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-18T02:44:24.564121Z","signature_b64":"Nn/8GFr6glbZP5tOcT8AmoFpfRxXtWGMSkT2/nocD+BkTwyWbJaHXWHwjlSs4plMKfX9VXwn6MVnVx87v9qiDg==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"068504625abb79d5ea14fd43e097881aa71107d568124dc8bd4b852ab359651e","last_reissued_at":"2026-05-18T02:44:24.563722Z","signature_status":"signed_v1","first_computed_at":"2026-05-18T02:44:24.563722Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Many-Shot CoT-ICL: Making In-Context Learning Truly Learn","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Many-shot chain-of-thought in-context learning behaves as test-time learning when demonstrations are ordered for smooth conceptual progression.","cross_cats":["cs.AI"],"primary_cat":"cs.CL","authors_text":"Dit-Yan Yeung, Lemao Liu, Mo Yu, Tsz Ting Chung","submitted_at":"2026-05-13T13:30:12Z","abstract_excerpt":"In-context learning (ICL) adapts large language models (LLMs) to new tasks by conditioning on demonstrations in the prompt without parameter updates. With long-context models, many-shot ICL can use dozens to hundreds of examples and achieve performance comparable to fine-tuning, yet current understanding of its scaling behavior is largely derived from non-reasoning tasks. We study many-shot chain-of-thought in-context learning (CoT-ICL) for reasoning and show that standard many-shot rules do not transfer. Across non-reasoning and reasoning-oriented LLMs and across non-reasoning and reasoning t"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"We interpret these behaviors by viewing many-shot CoT-ICL as in-context test-time learning rather than scaled pattern matching, and suggests two principles: (i) demonstrations should be easy for the target model to understand, and (ii) they should be ordered to support a smooth conceptual progression. Guided by the principle, we propose Curvilinear Demonstration Selection (CDS), a simple ordering method that yields up to a 5.42 percentage-point gain on geometry with 64 demonstrations.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the observed scaling effects and performance gains stem from the model performing test-time learning enabled by ordered demonstrations, rather than from other factors such as prompt length or specific model architectures, and that the two principles generalize beyond the tested models and tasks.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Many-shot CoT-ICL functions as test-time learning when demonstrations are ordered for smooth conceptual progression rather than similarity, enabling a new selection method that improves reasoning performance.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Many-shot chain-of-thought in-context learning behaves as test-time learning when demonstrations are ordered for smooth conceptual progression.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"52231ea3cb458b8daff123929ac66eb19e969d9691adf8033c70be6110a3edc5"},"source":{"id":"2605.13511","kind":"arxiv","version":1},"verdict":{"id":"e29b92ce-cd48-48b3-8538-03046e4c7c00","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T19:12:41.184249Z","strongest_claim":"We interpret these behaviors by viewing many-shot CoT-ICL as in-context test-time learning rather than scaled pattern matching, and suggests two principles: (i) demonstrations should be easy for the target model to understand, and (ii) they should be ordered to support a smooth conceptual progression. Guided by the principle, we propose Curvilinear Demonstration Selection (CDS), a simple ordering method that yields up to a 5.42 percentage-point gain on geometry with 64 demonstrations.","one_line_summary":"Many-shot CoT-ICL functions as test-time learning when demonstrations are ordered for smooth conceptual progression rather than similarity, enabling a new selection method that improves reasoning performance.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the observed scaling effects and performance gains stem from the model performing test-time learning enabled by ordered demonstrations, rather than from other factors such as prompt length or specific model architectures, and that the two principles generalize beyond the tested models and tasks.","pith_extraction_headline":"Many-shot chain-of-thought in-context learning behaves as test-time learning when demonstrations are ordered for smooth conceptual progression."},"references":{"count":58,"sample":[{"doi":"10.18653/v1/2024.findings-emnlp.646","year":2024,"title":"Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability","work_id":"47267f47-9851-431f-8490-71956518971c","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.18653/v1/2025.findings-naacl.283","year":2025,"title":"Induction Heads as an Essential Mechanism for Pattern Matching in In-context Learning","work_id":"1ff98088-c6d4-4b10-9cb2-63f8ce9afd1f","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2022,"title":"In-context Learning and Induction Heads , author=. 2022 , eprint=","work_id":"f6d72d45-0f97-4622-a9be-687431b1996a","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.18653/v1/2025.naacl-long.569","year":2025,"title":"The Stochastic Parrot on LLM ' s Shoulder: A Summative Assessment of Physical Concept Understanding","work_id":"595396f7-7cb2-42c0-b738-74425c34b6b2","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.18653/v1/2025.findings-emnlp.47","year":2025,"title":"D iv L ogic E val: A Framework for Benchmarking Logical Reasoning Evaluation in Large Language Models","work_id":"0af2d860-1562-4282-b9c2-9c304fc8b1b9","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":58,"snapshot_sha256":"0e72d6c1cd9582b00a7bcaa7687a7b948090096a4b26cea964f3ffd674914a32","internal_anchors":2},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2605.13511","created_at":"2026-05-18T02:44:24.563792+00:00"},{"alias_kind":"arxiv_version","alias_value":"2605.13511v1","created_at":"2026-05-18T02:44:24.563792+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2605.13511","created_at":"2026-05-18T02:44:24.563792+00:00"},{"alias_kind":"pith_short_12","alias_value":"A2CQIYS2XN45","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"A2CQIYS2XN45L2QU","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"A2CQIYS2","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":0,"internal_anchor_count":0,"sample":[]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/A2CQIYS2XN45L2QU7VB6BF4IDK","json":"https://pith.science/pith/A2CQIYS2XN45L2QU7VB6BF4IDK.json","graph_json":"https://pith.science/api/pith-number/A2CQIYS2XN45L2QU7VB6BF4IDK/graph.json","events_json":"https://pith.science/api/pith-number/A2CQIYS2XN45L2QU7VB6BF4IDK/events.json","paper":"https://pith.science/paper/A2CQIYS2"},"agent_actions":{"view_html":"https://pith.science/pith/A2CQIYS2XN45L2QU7VB6BF4IDK","download_json":"https://pith.science/pith/A2CQIYS2XN45L2QU7VB6BF4IDK.json","view_paper":"https://pith.science/paper/A2CQIYS2","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2605.13511&json=true","fetch_graph":"https://pith.science/api/pith-number/A2CQIYS2XN45L2QU7VB6BF4IDK/graph.json","fetch_events":"https://pith.science/api/pith-number/A2CQIYS2XN45L2QU7VB6BF4IDK/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/A2CQIYS2XN45L2QU7VB6BF4IDK/action/timestamp_anchor","attest_storage":"https://pith.science/pith/A2CQIYS2XN45L2QU7VB6BF4IDK/action/storage_attestation","attest_author":"https://pith.science/pith/A2CQIYS2XN45L2QU7VB6BF4IDK/action/author_attestation","sign_citation":"https://pith.science/pith/A2CQIYS2XN45L2QU7VB6BF4IDK/action/citation_signature","submit_replication":"https://pith.science/pith/A2CQIYS2XN45L2QU7VB6BF4IDK/action/replication_record"}},"created_at":"2026-05-18T02:44:24.563792+00:00","updated_at":"2026-05-18T02:44:24.563792+00:00"}