{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:FBNZSVAADCMHIAINYGJFFYTTUH","short_pith_number":"pith:FBNZSVAA","schema_version":"1.0","canonical_sha256":"285b995400189874010dc19252e273a1c0d274ea513fd36e9abf8f9ffc69d84e","source":{"kind":"arxiv","id":"2510.14901","version":1},"attestation_state":"computed","paper":{"title":"Reasoning with Sampling: Your Base Model is Smarter Than You Think","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A simple iterative sampling algorithm using only a base model's likelihoods can elicit reasoning performance that nearly matches or exceeds reinforcement learning on tasks like math and coding.","cross_cats":["cs.AI","cs.CL"],"primary_cat":"cs.LG","authors_text":"Aayush Karan, Yilun Du","submitted_at":"2025-10-16T17:18:11Z","abstract_excerpt":"Frontier reasoning models have exhibited incredible capabilities across a wide array of disciplines, driven by posttraining large language models (LLMs) with reinforcement learning (RL). However, despite the widespread success of this paradigm, much of the literature has been devoted to disentangling truly novel behaviors that emerge during RL but are not present in the base models. In our work, we approach this question from a different angle, instead asking whether comparable reasoning capabilites can be elicited from base models at inference time by pure sampling, without any additional tra"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":false,"formal_links_present":true},"canonical_record":{"source":{"id":"2510.14901","kind":"arxiv","version":1},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.LG","submitted_at":"2025-10-16T17:18:11Z","cross_cats_sorted":["cs.AI","cs.CL"],"title_canon_sha256":"cea8936658e1fb416995205bc8324e264599d38a3883317244765200d3346c24","abstract_canon_sha256":"4ecb3b6cb04907b232cb999bb15cf3e1b01ea1ad94e7c536f7256b8d3dc1ee33"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:15.272477Z","signature_b64":"ogxGxWB92mPd9imEdFA+IoA5U0fKjjOukzqK6makh7B75AcTq4ELFx7htup1YGzm8d/awq/q8G1BlkNjE/hkAw==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"285b995400189874010dc19252e273a1c0d274ea513fd36e9abf8f9ffc69d84e","last_reissued_at":"2026-05-17T23:38:15.272016Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:15.272016Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Reasoning with Sampling: Your Base Model is Smarter Than You Think","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A simple iterative sampling algorithm using only a base model's likelihoods can elicit reasoning performance that nearly matches or exceeds reinforcement learning on tasks like math and coding.","cross_cats":["cs.AI","cs.CL"],"primary_cat":"cs.LG","authors_text":"Aayush Karan, Yilun Du","submitted_at":"2025-10-16T17:18:11Z","abstract_excerpt":"Frontier reasoning models have exhibited incredible capabilities across a wide array of disciplines, driven by posttraining large language models (LLMs) with reinforcement learning (RL). However, despite the widespread success of this paradigm, much of the literature has been devoted to disentangling truly novel behaviors that emerge during RL but are not present in the base models. In our work, we approach this question from a different angle, instead asking whether comparable reasoning capabilites can be elicited from base models at inference time by pure sampling, without any additional tra"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Over different base models, we show that our algorithm offers substantial boosts in reasoning that nearly match and even outperform those from RL on a wide variety of single-shot tasks, including MATH500, HumanEval, and GPQA.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the base model's likelihoods contain sufficient signal to be iteratively reshaped into higher-quality reasoning trajectories via a simple MCMC-style sampler without any training or external verifier.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"An MCMC-inspired iterative sampler applied to base LLMs elicits reasoning performance that nearly matches or exceeds RL-posttrained models on MATH500, HumanEval, and GPQA while preserving output diversity.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A simple iterative sampling algorithm using only a base model's likelihoods can elicit reasoning performance that nearly matches or exceeds reinforcement learning on tasks like math and coding.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"4c3db4abc89fc35ad229d37fba8da9f3e8a70103c81671eb1642f3b40242dd30"},"source":{"id":"2510.14901","kind":"arxiv","version":1},"verdict":{"id":"d35122ec-9601-4e85-bcb1-a7c4cbc56e1c","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T03:11:43.647449Z","strongest_claim":"Over different base models, we show that our algorithm offers substantial boosts in reasoning that nearly match and even outperform those from RL on a wide variety of single-shot tasks, including MATH500, HumanEval, and GPQA.","one_line_summary":"An MCMC-inspired iterative sampler applied to base LLMs elicits reasoning performance that nearly matches or exceeds RL-posttrained models on MATH500, HumanEval, and GPQA while preserving output diversity.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the base model's likelihoods contain sufficient signal to be iteratively reshaped into higher-quality reasoning trajectories via a simple MCMC-style sampler without any training or external verifier.","pith_extraction_headline":"A simple iterative sampling algorithm using only a base model's likelihoods can elicit reasoning performance that nearly matches or exceeds reinforcement learning on tasks like math and coding."},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"7c0d59d9ed0e8b1a16e6e404271f51c1969b8b7b9b3aca1c17cee0d361c2ea9f"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2510.14901","created_at":"2026-05-17T23:38:15.272086+00:00"},{"alias_kind":"arxiv_version","alias_value":"2510.14901v1","created_at":"2026-05-17T23:38:15.272086+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2510.14901","created_at":"2026-05-17T23:38:15.272086+00:00"},{"alias_kind":"pith_short_12","alias_value":"FBNZSVAADCMH","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"FBNZSVAADCMHIAIN","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"FBNZSVAA","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":24,"internal_anchor_count":24,"sample":[{"citing_arxiv_id":"2605.21654","citing_title":"Value-Gradient Hypothesis of RL for LLMs","ref_index":1,"is_internal_anchor":true},{"citing_arxiv_id":"2601.18577","citing_title":"Self-Refining Video Sampling","ref_index":6,"is_internal_anchor":true},{"citing_arxiv_id":"2601.21484","citing_title":"ETS: Energy-Guided Test-Time Scaling for Training-Free RL Alignment","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2605.20189","citing_title":"SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation","ref_index":64,"is_internal_anchor":true},{"citing_arxiv_id":"2605.19461","citing_title":"Beyond Mode Collapse: Distribution Matching for Diverse Reasoning","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2512.15567","citing_title":"Evaluating Large Language Models in Scientific Discovery","ref_index":52,"is_internal_anchor":true},{"citing_arxiv_id":"2601.18832","citing_title":"The Geometric Reasoner: Manifold-Informed Latent Foresight Search for Long-Context Reasoning","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2601.21484","citing_title":"ETS: Energy-Guided Test-Time Scaling for Training-Free RL Alignment","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2603.23607","citing_title":"LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset","ref_index":35,"is_internal_anchor":true},{"citing_arxiv_id":"2604.26326","citing_title":"Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06241","citing_title":"Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2604.26326","citing_title":"Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08817","citing_title":"How You Begin is How You Reason: Driving Exploration in RLVR via Prefix-Tuned Priors","ref_index":19,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06116","citing_title":"Policy-Guided Stepwise Model Routing for Cost-Effective Reasoning","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06241","citing_title":"Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2605.00742","citing_title":"Position: agentic AI orchestration should be Bayes-consistent","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2605.00365","citing_title":"Uniform-Correct Policy Optimization: Breaking RLVR's Indifference to Diversity","ref_index":15,"is_internal_anchor":true},{"citing_arxiv_id":"2604.18530","citing_title":"OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning","ref_index":41,"is_internal_anchor":true},{"citing_arxiv_id":"2604.16453","citing_title":"Sampling for Quality: Training-Free Reward-Guided LLM Decoding via Sequential Monte Carlo","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2605.02913","citing_title":"Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning","ref_index":58,"is_internal_anchor":true},{"citing_arxiv_id":"2604.04855","citing_title":"The Role of Generator Access in Autoregressive Post-Training","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2604.17928","citing_title":"HEALing Entropy Collapse: Enhancing Exploration in Few-Shot RLVR via Hybrid-Domain Entropy Dynamics Alignment","ref_index":23,"is_internal_anchor":true},{"citing_arxiv_id":"2604.16259","citing_title":"Beyond Distribution Sharpening: The Importance of Task Rewards","ref_index":15,"is_internal_anchor":true},{"citing_arxiv_id":"2605.05851","citing_title":"Hypothesis generation and updating in large language models","ref_index":18,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/FBNZSVAADCMHIAINYGJFFYTTUH","json":"https://pith.science/pith/FBNZSVAADCMHIAINYGJFFYTTUH.json","graph_json":"https://pith.science/api/pith-number/FBNZSVAADCMHIAINYGJFFYTTUH/graph.json","events_json":"https://pith.science/api/pith-number/FBNZSVAADCMHIAINYGJFFYTTUH/events.json","paper":"https://pith.science/paper/FBNZSVAA"},"agent_actions":{"view_html":"https://pith.science/pith/FBNZSVAADCMHIAINYGJFFYTTUH","download_json":"https://pith.science/pith/FBNZSVAADCMHIAINYGJFFYTTUH.json","view_paper":"https://pith.science/paper/FBNZSVAA","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2510.14901&json=true","fetch_graph":"https://pith.science/api/pith-number/FBNZSVAADCMHIAINYGJFFYTTUH/graph.json","fetch_events":"https://pith.science/api/pith-number/FBNZSVAADCMHIAINYGJFFYTTUH/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/FBNZSVAADCMHIAINYGJFFYTTUH/action/timestamp_anchor","attest_storage":"https://pith.science/pith/FBNZSVAADCMHIAINYGJFFYTTUH/action/storage_attestation","attest_author":"https://pith.science/pith/FBNZSVAADCMHIAINYGJFFYTTUH/action/author_attestation","sign_citation":"https://pith.science/pith/FBNZSVAADCMHIAINYGJFFYTTUH/action/citation_signature","submit_replication":"https://pith.science/pith/FBNZSVAADCMHIAINYGJFFYTTUH/action/replication_record"}},"created_at":"2026-05-17T23:38:15.272086+00:00","updated_at":"2026-05-17T23:38:15.272086+00:00"}