{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2023:NO6CY7DAGK6FHI6SZCLGCUJYDR","short_pith_number":"pith:NO6CY7DA","schema_version":"1.0","canonical_sha256":"6bbc2c7c6032bc53a3d2c8966151381c64e83197a00b969fc960c1b941ae98f6","source":{"kind":"arxiv","id":"2306.14289","version":2},"attestation_state":"computed","paper":{"title":"Faster Segment Anything: Towards Lightweight SAM for Mobile Applications","license":"http://creativecommons.org/licenses/by-nc-sa/4.0/","headline":"Distilling SAM's heavy encoder into a lightweight one creates MobileSAM, over 60 times smaller with matching zero-shot segmentation performance.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Chaoning Zhang, Choong Seon Hong, Dongshen Han, Jung Uk Kim, Seungkyu Lee, Sung-Ho Bae, Yu Qiao","submitted_at":"2023-06-25T16:37:25Z","abstract_excerpt":"Segment Anything Model (SAM) has attracted significant attention due to its impressive zero-shot transfer performance and high versatility for numerous vision applications (like image editing with fine-grained control). Many of such applications need to be run on resource-constraint edge devices, like mobile phones. In this work, we aim to make SAM mobile-friendly by replacing the heavyweight image encoder with a lightweight one. A naive way to train such a new SAM as in the original SAM paper leads to unsatisfactory performance, especially when limited training sources are available. We find "},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2306.14289","kind":"arxiv","version":2},"metadata":{"license":"http://creativecommons.org/licenses/by-nc-sa/4.0/","primary_cat":"cs.CV","submitted_at":"2023-06-25T16:37:25Z","cross_cats_sorted":[],"title_canon_sha256":"4becd44a08c66d2aaa9186af9b0864a26f2d60ff538c3b3d3185e17a643bc367","abstract_canon_sha256":"262fac928c7e47438491724ca07d6e870797dc053df61f2173e6c21a926a9a38"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:12.783281Z","signature_b64":"9aFLeEfXQN8Rv+UdiyeLnuAHLBYJiqD9++DWxIAZrFMb9Jk88y3M4OL1B0AQ5zDNrt4gZH7siuypqrs+BSwDBQ==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"6bbc2c7c6032bc53a3d2c8966151381c64e83197a00b969fc960c1b941ae98f6","last_reissued_at":"2026-05-17T23:38:12.782631Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:12.782631Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Faster Segment Anything: Towards Lightweight SAM for Mobile Applications","license":"http://creativecommons.org/licenses/by-nc-sa/4.0/","headline":"Distilling SAM's heavy encoder into a lightweight one creates MobileSAM, over 60 times smaller with matching zero-shot segmentation performance.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Chaoning Zhang, Choong Seon Hong, Dongshen Han, Jung Uk Kim, Seungkyu Lee, Sung-Ho Bae, Yu Qiao","submitted_at":"2023-06-25T16:37:25Z","abstract_excerpt":"Segment Anything Model (SAM) has attracted significant attention due to its impressive zero-shot transfer performance and high versatility for numerous vision applications (like image editing with fine-grained control). Many of such applications need to be run on resource-constraint edge devices, like mobile phones. In this work, we aim to make SAM mobile-friendly by replacing the heavyweight image encoder with a lightweight one. A naive way to train such a new SAM as in the original SAM paper leads to unsatisfactory performance, especially when limited training sources are available. We find "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"the resulting lightweight SAM is termed MobileSAM which is more than 60 times smaller yet performs on par with the original SAM","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"that a lightweight encoder distilled only from the frozen original encoder will remain compatible with the original mask decoder across diverse downstream tasks without further joint fine-tuning","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"MobileSAM is a 60x smaller distilled version of SAM that matches original performance and runs 5x faster than concurrent FastSAM while supporting CPU inference.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Distilling SAM's heavy encoder into a lightweight one creates MobileSAM, over 60 times smaller with matching zero-shot segmentation performance.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"56fb1bfec08cd2eae39b54bb9581db78395a38832c2c1f23c2ada5d67f8d3d86"},"source":{"id":"2306.14289","kind":"arxiv","version":2},"verdict":{"id":"9db7228b-9ee9-4bbc-84c1-da0bad73d9af","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T22:37:57.264741Z","strongest_claim":"the resulting lightweight SAM is termed MobileSAM which is more than 60 times smaller yet performs on par with the original SAM","one_line_summary":"MobileSAM is a 60x smaller distilled version of SAM that matches original performance and runs 5x faster than concurrent FastSAM while supporting CPU inference.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"that a lightweight encoder distilled only from the frozen original encoder will remain compatible with the original mask decoder across diverse downstream tasks without further joint fine-tuning","pith_extraction_headline":"Distilling SAM's heavy encoder into a lightweight one creates MobileSAM, over 60 times smaller with matching zero-shot segmentation performance."},"references":{"count":18,"sample":[{"doi":"","year":null,"title":"One small step for generative ai, one giant leap for agi: A complete survey on chatgpt in aigc era","work_id":"e0645322-bb71-4701-b9ff-134537e95fe8","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"On the Opportunities and Risks of Foundation Models","work_id":"a18039e9-928d-47c9-a836-32656a71bf71","ref_index":2,"cited_arxiv_id":"2108.07258","is_internal_anchor":true},{"doi":"","year":null,"title":"Mp-fedcl: Multi-prototype federated contrastive learning for edge intelligence","work_id":"c0a2ea68-d6a5-40cb-97a0-b9abfac75009","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Fast segment anything","work_id":"feed3d9f-cc9f-42db-90e6-e9ff051cef57","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Segment anything in medical images","work_id":"b64c0fe5-9896-4720-94a0-9f05513ee885","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":18,"snapshot_sha256":"2a62ff75b5da33cefb09495749782b1555582fb10f7511aa6c0deb8f0e9a575e","internal_anchors":4},"formal_canon":{"evidence_count":2,"snapshot_sha256":"8025b99c4461f43a4c3d613f9306938736a764a247dc06bde4265860b4459920"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2306.14289","created_at":"2026-05-17T23:38:12.782731+00:00"},{"alias_kind":"arxiv_version","alias_value":"2306.14289v2","created_at":"2026-05-17T23:38:12.782731+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2306.14289","created_at":"2026-05-17T23:38:12.782731+00:00"},{"alias_kind":"pith_short_12","alias_value":"NO6CY7DAGK6F","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"NO6CY7DAGK6FHI6S","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"NO6CY7DA","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":28,"internal_anchor_count":28,"sample":[{"citing_arxiv_id":"2410.04960","citing_title":"On Efficient Variants of Segment Anything Model: A Survey","ref_index":43,"is_internal_anchor":true},{"citing_arxiv_id":"2502.13451","citing_title":"MapNav: A Novel Memory Representation via Annotated Semantic Maps for Vision-and-Language Navigation","ref_index":46,"is_internal_anchor":true},{"citing_arxiv_id":"2604.21363","citing_title":"A Deployable Embodied Vision-Language Navigation System with Hierarchical Cognition and Context-Aware Exploration","ref_index":34,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18868","citing_title":"DarkLLM: Learning Language-Driven Adversarial Attacks with Large Language Models","ref_index":67,"is_internal_anchor":true},{"citing_arxiv_id":"2605.17633","citing_title":"SparseSAM: Structured Sparsification of Activations in Segment Anything Models","ref_index":33,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18013","citing_title":"TinySAM 2: Extreme Memory Compression for Efficient Track Anything Model","ref_index":34,"is_internal_anchor":true},{"citing_arxiv_id":"2605.19206","citing_title":"CLUE: Adaptively Prioritized Contextual Cues by Leveraging a Unified Semantic Map for Effective Zero-Shot Object-Goal Navigation","ref_index":29,"is_internal_anchor":true},{"citing_arxiv_id":"2605.16901","citing_title":"CAR-SAM: Cross-Attention Reconstruction for Post-Training Quantization of the Segment Anything Model","ref_index":25,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06223","citing_title":"ProCompNav: Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries","ref_index":32,"is_internal_anchor":true},{"citing_arxiv_id":"2509.16445","citing_title":"FiLM-Nav: Efficient and Generalizable Navigation via VLM Fine-tuning","ref_index":29,"is_internal_anchor":true},{"citing_arxiv_id":"2511.12878","citing_title":"Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views","ref_index":65,"is_internal_anchor":true},{"citing_arxiv_id":"2601.02018","citing_title":"Towards Any-Quality Image Segmentation via Generative and Adaptive Latent Space Enhancement","ref_index":84,"is_internal_anchor":true},{"citing_arxiv_id":"2601.13895","citing_title":"OmniOVCD: Streamlining Open-Vocabulary Change Detection with SAM 3","ref_index":34,"is_internal_anchor":true},{"citing_arxiv_id":"2603.21887","citing_title":"IGV-RRT: Prior-Real-Time Observation Fusion for Active Object Search in Changing Environments","ref_index":18,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11578","citing_title":"The Midas Touch for Metric Depth","ref_index":73,"is_internal_anchor":true},{"citing_arxiv_id":"2604.27128","citing_title":"Lightweight Distillation of SAM 3 and DINOv3 for Edge-Deployable Individual-Level Livestock Monitoring and Longitudinal Visual Analytics","ref_index":19,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06223","citing_title":"ProCompNav: Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries","ref_index":32,"is_internal_anchor":true},{"citing_arxiv_id":"2604.21363","citing_title":"A Deployable Embodied Vision-Language Navigation System with Hierarchical Cognition and Context-Aware Exploration","ref_index":34,"is_internal_anchor":true},{"citing_arxiv_id":"2604.19257","citing_title":"Unposed-to-3D: Learning Simulation-Ready Vehicles from Real-World Images","ref_index":64,"is_internal_anchor":true},{"citing_arxiv_id":"2604.12113","citing_title":"PR-MaGIC: Prompt Refinement Via Mask Decoder Gradient Flow For In-Context Segmentation","ref_index":30,"is_internal_anchor":true},{"citing_arxiv_id":"2605.01736","citing_title":"Multi-Scale Gaussian-Language Map for Zero-shot Embodied Navigation and Reasoning","ref_index":47,"is_internal_anchor":true},{"citing_arxiv_id":"2605.01700","citing_title":"TrajRAG: Retrieving Geometric-Semantic Experience for Zero-Shot Object Navigation","ref_index":51,"is_internal_anchor":true},{"citing_arxiv_id":"2604.11162","citing_title":"Boxes2Pixels: Learning Defect Segmentation from Noisy SAM Masks","ref_index":39,"is_internal_anchor":true},{"citing_arxiv_id":"2401.14159","citing_title":"Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks","ref_index":76,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06223","citing_title":"ProCompNav: Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries","ref_index":32,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/NO6CY7DAGK6FHI6SZCLGCUJYDR","json":"https://pith.science/pith/NO6CY7DAGK6FHI6SZCLGCUJYDR.json","graph_json":"https://pith.science/api/pith-number/NO6CY7DAGK6FHI6SZCLGCUJYDR/graph.json","events_json":"https://pith.science/api/pith-number/NO6CY7DAGK6FHI6SZCLGCUJYDR/events.json","paper":"https://pith.science/paper/NO6CY7DA"},"agent_actions":{"view_html":"https://pith.science/pith/NO6CY7DAGK6FHI6SZCLGCUJYDR","download_json":"https://pith.science/pith/NO6CY7DAGK6FHI6SZCLGCUJYDR.json","view_paper":"https://pith.science/paper/NO6CY7DA","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2306.14289&json=true","fetch_graph":"https://pith.science/api/pith-number/NO6CY7DAGK6FHI6SZCLGCUJYDR/graph.json","fetch_events":"https://pith.science/api/pith-number/NO6CY7DAGK6FHI6SZCLGCUJYDR/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/NO6CY7DAGK6FHI6SZCLGCUJYDR/action/timestamp_anchor","attest_storage":"https://pith.science/pith/NO6CY7DAGK6FHI6SZCLGCUJYDR/action/storage_attestation","attest_author":"https://pith.science/pith/NO6CY7DAGK6FHI6SZCLGCUJYDR/action/author_attestation","sign_citation":"https://pith.science/pith/NO6CY7DAGK6FHI6SZCLGCUJYDR/action/citation_signature","submit_replication":"https://pith.science/pith/NO6CY7DAGK6FHI6SZCLGCUJYDR/action/replication_record"}},"created_at":"2026-05-17T23:38:12.782731+00:00","updated_at":"2026-05-17T23:38:12.782731+00:00"}