{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:ATSZCHBVBU5N5BQBECGSBOTGEJ","short_pith_number":"pith:ATSZCHBV","schema_version":"1.0","canonical_sha256":"04e5911c350d3ade8601208d20ba66227b7e87667749c5f06ea002e67a1a6fb7","source":{"kind":"arxiv","id":"2510.08558","version":3},"attestation_state":"computed","paper":{"title":"Agent Learning via Early Experience","license":"http://creativecommons.org/licenses/by/4.0/","headline":"","cross_cats":["cs.CL","cs.IR","cs.LG"],"primary_cat":"cs.AI","authors_text":"Ashish Shah, Bo Liu, Boyu Gou, Dat Huynh, Hengduo Li, Huan Sun, Jason Weston, Jiacheng Zhu, Jianwei Yang, Jian Xie, Kai Zhang, Lawrence Jang, Ning Zhang, Qi Qi, Sara Cao, Shuyan Zhou, Tianci Xue, Xiangchao Chen, Xian Li, Xiaohan Fu, Xiyao Wang, Yifan Wu, Yu Su, Yuting Ning, Yuxuan Sun, Zeyi Liao, Zhaorun Chen, Zhihan Liu, Zihang Meng, Zi Yang","submitted_at":"2025-10-09T17:59:17Z","abstract_excerpt":"A long-term goal of language agents is to learn and improve through their own experience, ultimately outperforming humans in complex, real-world tasks. However, training agents from experience data with reinforcement learning remains difficult in many environments, which either lack verifiable rewards (e.g., websites) or require inefficient long-horizon rollouts (e.g., multi-turn tool use). As a result, most current agents rely on supervised fine-tuning on expert data, which is challenging to scale and generalizes poorly. This limitation stems from the nature of expert demonstrations: they cap"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":false,"formal_links_present":false},"canonical_record":{"source":{"id":"2510.08558","kind":"arxiv","version":3},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.AI","submitted_at":"2025-10-09T17:59:17Z","cross_cats_sorted":["cs.CL","cs.IR","cs.LG"],"title_canon_sha256":"8522a5493521db03af60add728d52592abf69f4bfb9a1dadda2a04a91ffde055","abstract_canon_sha256":"b8d22a991687452f8586bcb2dc0cbf565ff6b4cfdefd487891d294548a8d0fd9"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-26T02:03:57.704852Z","signature_b64":"HBQ8qy3MRRUb7fQabK31Wytt+G/Bc1gSTSAEw3YpvdNvLpF3HMfCkWE6gjdslyQwXTbWi0YAWKoyWFwC2LLZDQ==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"04e5911c350d3ade8601208d20ba66227b7e87667749c5f06ea002e67a1a6fb7","last_reissued_at":"2026-05-26T02:03:57.703847Z","signature_status":"signed_v1","first_computed_at":"2026-05-26T02:03:57.703847Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Agent Learning via Early Experience","license":"http://creativecommons.org/licenses/by/4.0/","headline":"","cross_cats":["cs.CL","cs.IR","cs.LG"],"primary_cat":"cs.AI","authors_text":"Ashish Shah, Bo Liu, Boyu Gou, Dat Huynh, Hengduo Li, Huan Sun, Jason Weston, Jiacheng Zhu, Jianwei Yang, Jian Xie, Kai Zhang, Lawrence Jang, Ning Zhang, Qi Qi, Sara Cao, Shuyan Zhou, Tianci Xue, Xiangchao Chen, Xian Li, Xiaohan Fu, Xiyao Wang, Yifan Wu, Yu Su, Yuting Ning, Yuxuan Sun, Zeyi Liao, Zhaorun Chen, Zhihan Liu, Zihang Meng, Zi Yang","submitted_at":"2025-10-09T17:59:17Z","abstract_excerpt":"A long-term goal of language agents is to learn and improve through their own experience, ultimately outperforming humans in complex, real-world tasks. However, training agents from experience data with reinforcement learning remains difficult in many environments, which either lack verifiable rewards (e.g., websites) or require inefficient long-horizon rollouts (e.g., multi-turn tool use). As a result, most current agents rely on supervised fine-tuning on expert data, which is challenging to scale and generalizes poorly. This limitation stems from the nature of expert demonstrations: they cap"},"claims":{"count":0,"items":[],"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"source":{"id":"2510.08558","kind":"arxiv","version":3},"verdict":{"id":null,"model_set":{},"created_at":null,"strongest_claim":"","one_line_summary":"","pipeline_version":null,"weakest_assumption":"","pith_extraction_headline":""},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2510.08558/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2510.08558","created_at":"2026-05-26T02:03:57.703994+00:00"},{"alias_kind":"arxiv_version","alias_value":"2510.08558v3","created_at":"2026-05-26T02:03:57.703994+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2510.08558","created_at":"2026-05-26T02:03:57.703994+00:00"},{"alias_kind":"pith_short_12","alias_value":"ATSZCHBVBU5N","created_at":"2026-05-26T02:03:57.703994+00:00"},{"alias_kind":"pith_short_16","alias_value":"ATSZCHBVBU5N5BQB","created_at":"2026-05-26T02:03:57.703994+00:00"},{"alias_kind":"pith_short_8","alias_value":"ATSZCHBV","created_at":"2026-05-26T02:03:57.703994+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":15,"internal_anchor_count":15,"sample":[{"citing_arxiv_id":"2605.15706","citing_title":"Differentiable Mixture-of-Agents Incentivizes Swarm Intelligence of Large Language Models","ref_index":52,"is_internal_anchor":true},{"citing_arxiv_id":"2605.17721","citing_title":"EXG: Self-Evolving Agents with Experience Graphs","ref_index":48,"is_internal_anchor":true},{"citing_arxiv_id":"2511.21678","citing_title":"Agentic Learner with Grow-and-Refine Multimodal Semantic Memory","ref_index":48,"is_internal_anchor":true},{"citing_arxiv_id":"2604.02345","citing_title":"UI-Oceanus: Scaling GUI Agents with Synthetic Environmental Dynamics","ref_index":55,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13037","citing_title":"MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning","ref_index":47,"is_internal_anchor":true},{"citing_arxiv_id":"2605.03308","citing_title":"Revisiting the Travel Planning Capabilities of Large Language Models","ref_index":20,"is_internal_anchor":true},{"citing_arxiv_id":"2605.05413","citing_title":"From History to State: Constant-Context Skill Learning for LLM Agents","ref_index":42,"is_internal_anchor":true},{"citing_arxiv_id":"2604.11544","citing_title":"Time is Not a Label: Continuous Phase Rotation for Temporal Knowledge Graphs and Agentic Memory","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2604.08224","citing_title":"Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering","ref_index":187,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07180","citing_title":"Learning Agent Routing From Early Experience","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06702","citing_title":"CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment","ref_index":58,"is_internal_anchor":true},{"citing_arxiv_id":"2604.18131","citing_title":"Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration","ref_index":23,"is_internal_anchor":true},{"citing_arxiv_id":"2604.17928","citing_title":"HEALing Entropy Collapse: Enhancing Exploration in Few-Shot RLVR via Hybrid-Domain Entropy Dynamics Alignment","ref_index":26,"is_internal_anchor":true},{"citing_arxiv_id":"2604.20148","citing_title":"Meta-Tool: Efficient Few-Shot Tool Adaptation for Small Language Models","ref_index":56,"is_internal_anchor":true},{"citing_arxiv_id":"2605.02469","citing_title":"Reference-Sampled Boltzmann Projection for KL-Regularized RLVR: Target-Matched Weighted SFT, Finite One-Shot Gaps, and Policy Mirror Descent","ref_index":56,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/ATSZCHBVBU5N5BQBECGSBOTGEJ","json":"https://pith.science/pith/ATSZCHBVBU5N5BQBECGSBOTGEJ.json","graph_json":"https://pith.science/api/pith-number/ATSZCHBVBU5N5BQBECGSBOTGEJ/graph.json","events_json":"https://pith.science/api/pith-number/ATSZCHBVBU5N5BQBECGSBOTGEJ/events.json","paper":"https://pith.science/paper/ATSZCHBV"},"agent_actions":{"view_html":"https://pith.science/pith/ATSZCHBVBU5N5BQBECGSBOTGEJ","download_json":"https://pith.science/pith/ATSZCHBVBU5N5BQBECGSBOTGEJ.json","view_paper":"https://pith.science/paper/ATSZCHBV","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2510.08558&json=true","fetch_graph":"https://pith.science/api/pith-number/ATSZCHBVBU5N5BQBECGSBOTGEJ/graph.json","fetch_events":"https://pith.science/api/pith-number/ATSZCHBVBU5N5BQBECGSBOTGEJ/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/ATSZCHBVBU5N5BQBECGSBOTGEJ/action/timestamp_anchor","attest_storage":"https://pith.science/pith/ATSZCHBVBU5N5BQBECGSBOTGEJ/action/storage_attestation","attest_author":"https://pith.science/pith/ATSZCHBVBU5N5BQBECGSBOTGEJ/action/author_attestation","sign_citation":"https://pith.science/pith/ATSZCHBVBU5N5BQBECGSBOTGEJ/action/citation_signature","submit_replication":"https://pith.science/pith/ATSZCHBVBU5N5BQBECGSBOTGEJ/action/replication_record"}},"created_at":"2026-05-26T02:03:57.703994+00:00","updated_at":"2026-05-26T02:03:57.703994+00:00"}