{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2023:RLLGOF3MAXI4YOLZT6FU2RJATF","short_pith_number":"pith:RLLGOF3M","schema_version":"1.0","canonical_sha256":"8ad667176c05d1cc39799f8b4d4520996b553b405e8bdb44407e12a1c16786b9","source":{"kind":"arxiv","id":"2305.07922","version":2},"attestation_state":"computed","paper":{"title":"CodeT5+: Open Code Large Language Models for Code Understanding and Generation","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"","cross_cats":["cs.LG","cs.PL"],"primary_cat":"cs.CL","authors_text":"Akhilesh Deepak Gotmare, Hung Le, Junnan Li, Nghi D.Q. Bui, Steven C.H. Hoi, Yue Wang","submitted_at":"2023-05-13T14:23:07Z","abstract_excerpt":"Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence. However, existing code LLMs have two main limitations in terms of architecture and pretraining tasks. First, they often adopt a specific architecture (encoder-only or decoder-only) or rely on a unified encoder-decoder network for different downstream tasks. The former paradigm is limited by inflexibility in applications while in the latter, the model is treated as a single system for all tasks, leading to suboptimal performance on a subset of tasks. Secondly, they often employ a l"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":false,"formal_links_present":false},"canonical_record":{"source":{"id":"2305.07922","kind":"arxiv","version":2},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.CL","submitted_at":"2023-05-13T14:23:07Z","cross_cats_sorted":["cs.LG","cs.PL"],"title_canon_sha256":"2d95024978242faa8c996fd31aa9134493ba76a7aaedd4ac97b9385e248c2ea1","abstract_canon_sha256":"f90e2387454ad38d02f38cd971cf19bb760bcaeb4e1c5794393e753922735222"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-19T05:19:47.341885Z","signature_b64":"cSMfFpJrTeAgn1g8CRiEGk1Vj5ve1+80Q3DbtAozSLTtjRFrDbD77QZZZoJnTWxby950emWS2HJ4eUiY9zzyBg==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"8ad667176c05d1cc39799f8b4d4520996b553b405e8bdb44407e12a1c16786b9","last_reissued_at":"2026-05-19T05:19:47.338773Z","signature_status":"signed_v1","first_computed_at":"2026-05-19T05:19:47.338773Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"CodeT5+: Open Code Large Language Models for Code Understanding and Generation","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"","cross_cats":["cs.LG","cs.PL"],"primary_cat":"cs.CL","authors_text":"Akhilesh Deepak Gotmare, Hung Le, Junnan Li, Nghi D.Q. Bui, Steven C.H. Hoi, Yue Wang","submitted_at":"2023-05-13T14:23:07Z","abstract_excerpt":"Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence. However, existing code LLMs have two main limitations in terms of architecture and pretraining tasks. First, they often adopt a specific architecture (encoder-only or decoder-only) or rely on a unified encoder-decoder network for different downstream tasks. The former paradigm is limited by inflexibility in applications while in the latter, the model is treated as a single system for all tasks, leading to suboptimal performance on a subset of tasks. Secondly, they often employ a l"},"claims":{"count":0,"items":[],"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"source":{"id":"2305.07922","kind":"arxiv","version":2},"verdict":{"id":null,"model_set":{},"created_at":null,"strongest_claim":"","one_line_summary":"","pipeline_version":null,"weakest_assumption":"","pith_extraction_headline":""},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2305.07922","created_at":"2026-05-19T05:19:47.338942+00:00"},{"alias_kind":"arxiv_version","alias_value":"2305.07922v2","created_at":"2026-05-19T05:19:47.338942+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2305.07922","created_at":"2026-05-19T05:19:47.338942+00:00"},{"alias_kind":"pith_short_12","alias_value":"RLLGOF3MAXI4","created_at":"2026-05-19T05:19:47.338942+00:00"},{"alias_kind":"pith_short_16","alias_value":"RLLGOF3MAXI4YOLZ","created_at":"2026-05-19T05:19:47.338942+00:00"},{"alias_kind":"pith_short_8","alias_value":"RLLGOF3M","created_at":"2026-05-19T05:19:47.338942+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":20,"internal_anchor_count":20,"sample":[{"citing_arxiv_id":"2307.06435","citing_title":"A Comprehensive Overview of Large Language Models","ref_index":34,"is_internal_anchor":true},{"citing_arxiv_id":"2507.06803","citing_title":"Text to model via SysML: Automated generation of dynamical system computational models from unstructured natural language text via enhanced System Modeling Language diagrams","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2507.21954","citing_title":"Fine-Tuning Code Language Models to Detect Cross-Language Bugs","ref_index":63,"is_internal_anchor":true},{"citing_arxiv_id":"2508.20086","citing_title":"Detecting Malicious Intents in Smart Contracts with Pre-trained Programming Language Models","ref_index":24,"is_internal_anchor":true},{"citing_arxiv_id":"2509.09192","citing_title":"ReDef: Do Code Language Models Truly Understand Code Changes for Just-in-Time Software Defect Prediction?","ref_index":57,"is_internal_anchor":true},{"citing_arxiv_id":"2309.05653","citing_title":"MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning","ref_index":56,"is_internal_anchor":true},{"citing_arxiv_id":"2512.18470","citing_title":"SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios","ref_index":52,"is_internal_anchor":true},{"citing_arxiv_id":"2507.03724","citing_title":"MemOS: A Memory OS for AI System","ref_index":3,"is_internal_anchor":true},{"citing_arxiv_id":"2312.13010","citing_title":"AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation","ref_index":36,"is_internal_anchor":true},{"citing_arxiv_id":"2604.16377","citing_title":"GoCoMA: Hyperbolic Multimodal Representation Fusion for Large Language Model-Generated Code Attribution","ref_index":29,"is_internal_anchor":true},{"citing_arxiv_id":"2305.01210","citing_title":"Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation","ref_index":64,"is_internal_anchor":true},{"citing_arxiv_id":"2604.02702","citing_title":"TypePro: Boosting LLM-Based Type Inference via Inter-Procedural Slicing","ref_index":32,"is_internal_anchor":true},{"citing_arxiv_id":"2306.11644","citing_title":"Textbooks Are All You Need","ref_index":29,"is_internal_anchor":true},{"citing_arxiv_id":"2604.27647","citing_title":"Tail-aware N-version Machine Learning Models for Reliable API Recommendation","ref_index":28,"is_internal_anchor":true},{"citing_arxiv_id":"2604.26667","citing_title":"Will It Break in Production? Metric-Driven Prediction of Residual Defects in Python Systems","ref_index":32,"is_internal_anchor":true},{"citing_arxiv_id":"2604.18907","citing_title":"Gradient-Based Program Synthesis with Neurally Interpreted Languages","ref_index":95,"is_internal_anchor":true},{"citing_arxiv_id":"2403.07974","citing_title":"LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2604.17184","citing_title":"SynthFix: Adaptive Neuro-Symbolic Code Vulnerability Repair","ref_index":25,"is_internal_anchor":true},{"citing_arxiv_id":"2604.20211","citing_title":"Towards Secure Logging: Characterizing and Benchmarking Logging Code Security Issues with LLMs","ref_index":52,"is_internal_anchor":true},{"citing_arxiv_id":"2504.15564","citing_title":"OpenClassGen: A Large-Scale Corpus of Real-World Python Classes for LLM Research","ref_index":53,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/RLLGOF3MAXI4YOLZT6FU2RJATF","json":"https://pith.science/pith/RLLGOF3MAXI4YOLZT6FU2RJATF.json","graph_json":"https://pith.science/api/pith-number/RLLGOF3MAXI4YOLZT6FU2RJATF/graph.json","events_json":"https://pith.science/api/pith-number/RLLGOF3MAXI4YOLZT6FU2RJATF/events.json","paper":"https://pith.science/paper/RLLGOF3M"},"agent_actions":{"view_html":"https://pith.science/pith/RLLGOF3MAXI4YOLZT6FU2RJATF","download_json":"https://pith.science/pith/RLLGOF3MAXI4YOLZT6FU2RJATF.json","view_paper":"https://pith.science/paper/RLLGOF3M","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2305.07922&json=true","fetch_graph":"https://pith.science/api/pith-number/RLLGOF3MAXI4YOLZT6FU2RJATF/graph.json","fetch_events":"https://pith.science/api/pith-number/RLLGOF3MAXI4YOLZT6FU2RJATF/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/RLLGOF3MAXI4YOLZT6FU2RJATF/action/timestamp_anchor","attest_storage":"https://pith.science/pith/RLLGOF3MAXI4YOLZT6FU2RJATF/action/storage_attestation","attest_author":"https://pith.science/pith/RLLGOF3MAXI4YOLZT6FU2RJATF/action/author_attestation","sign_citation":"https://pith.science/pith/RLLGOF3MAXI4YOLZT6FU2RJATF/action/citation_signature","submit_replication":"https://pith.science/pith/RLLGOF3MAXI4YOLZT6FU2RJATF/action/replication_record"}},"created_at":"2026-05-19T05:19:47.338942+00:00","updated_at":"2026-05-19T05:19:47.338942+00:00"}