{"paper":{"title":"Polaris: A G\\\"odel Agent Framework for Small Language Models through Experience-Abstracted Policy Repair","license":"http://creativecommons.org/licenses/by-nc-sa/4.0/","headline":"A 7B model improves its policy on unseen reasoning tasks by abstracting failures into compact reusable code patches.","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Aditya Kakade, Shirish Karande, Vivek Srivastava","submitted_at":"2026-03-24T12:25:32Z","abstract_excerpt":"G\\\"odel agent realize recursive self-improvement: an agent inspects its own policy and traces and then modifies that policy in a tested loop. We introduce Polaris, a G\\\"odel agent for compact models that performs policy repair via experience abstraction, turning failures into policy updates through a structured cycle of analysis, strategy formation, abstraction, and minimal code pat ch repair with conservative checks. Unlike response level self correction or parameter tuning, Polaris makes policy level changes with small, auditable patches that persist in the policy and are reused on unseen in"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"On MGSM, DROP, GPQA, and LitBench, a 7-billion-parameter model equipped with Polaris achieves consistent gains over the base policy and competitive baselines.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That experience abstraction reliably produces compact strategies that transfer to unseen instances and that the minimal code patches improve performance without introducing regressions on other tasks.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Polaris enables small LLMs to achieve recursive self-improvement by abstracting failure experiences into reusable policy patches that transfer across benchmark instances.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A 7B model improves its policy on unseen reasoning tasks by abstracting failures into compact reusable code patches.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"eff86409365092468c0c802b6d19df9fd18dcf20815ad5b0b9ef2e8e1e3f1a10"},"source":{"id":"2603.23129","kind":"arxiv","version":2},"verdict":{"id":"ec245c5b-0a79-49de-99e4-76f343933e47","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T07:40:43.835059Z","strongest_claim":"On MGSM, DROP, GPQA, and LitBench, a 7-billion-parameter model equipped with Polaris achieves consistent gains over the base policy and competitive baselines.","one_line_summary":"Polaris enables small LLMs to achieve recursive self-improvement by abstracting failure experiences into reusable policy patches that transfer across benchmark instances.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That experience abstraction reliably produces compact strategies that transfer to unseen instances and that the minimal code patches improve performance without introducing regressions on other tasks.","pith_extraction_headline":"A 7B model improves its policy on unseen reasoning tasks by abstracting failures into compact reusable code patches."},"references":{"count":13,"sample":[{"doi":"","year":null,"title":"Examine how the policy’s logic or structure caused the error","work_id":"a33d50f3-fdf4-4e99-b2d4-9e06effa7d6c","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Step-by-step suggestions on how the policy could be revised to solve the task","work_id":"dbff98c1-584f-4555-805d-0dcc1246d14f","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2018,"title":"‘python <code patch here>","work_id":"4f765951-8efc-4591-9c24-07ac6ce9e76a","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1999,"title":"role\": \"user","work_id":"2c7cc932-5102-41c9-9034-0bfb4185e4c2","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Continue to interact with the environment by executing actions based on the current analysis","work_id":"42f8bb27-123f-441c-bf5c-ec3682926d4a","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":13,"snapshot_sha256":"b49b8ae01abbf4714a03ff27077382cec9f4937f133a9df7a12d04b2cbbe5386","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"397cbde06cbd9a8be912240353b06215fa2caceee13020f8faf87a381d215d4a"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}