{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2024:MFY5ZWIYVVJHU6CWKFJGRL2CHD","short_pith_number":"pith:MFY5ZWIY","schema_version":"1.0","canonical_sha256":"6171dcd918ad527a7856515268af4238ee61806a698f4fccb46dc582f104ca30","source":{"kind":"arxiv","id":"2401.07519","version":2},"attestation_state":"computed","paper":{"title":"InstantID: Zero-shot Identity-Preserving Generation in Seconds","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"InstantID generates high-fidelity personalized images from one face photo in seconds without fine-tuning.","cross_cats":["cs.AI"],"primary_cat":"cs.CV","authors_text":"Anthony Chen, Haofan Wang, Huaxia Li, Qixun Wang, Xu Bai, Xu Tang, Yao Hu, Zekui Qin","submitted_at":"2024-01-15T07:50:18Z","abstract_excerpt":"There has been significant progress in personalized image synthesis with methods such as Textual Inversion, DreamBooth, and LoRA. Yet, their real-world applicability is hindered by high storage demands, lengthy fine-tuning processes, and the need for multiple reference images. Conversely, existing ID embedding-based methods, while requiring only a single forward inference, face challenges: they either necessitate extensive fine-tuning across numerous model parameters, lack compatibility with community pre-trained models, or fail to maintain high face fidelity. Addressing these limitations, we "},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2401.07519","kind":"arxiv","version":2},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.CV","submitted_at":"2024-01-15T07:50:18Z","cross_cats_sorted":["cs.AI"],"title_canon_sha256":"8d6ea9a5cfb31dd659312284b07e1456c05cee79909eb5f36658bebbd67e5318","abstract_canon_sha256":"1453585afe60e10c953c7cf24965f2a5e62be976a2aca5a311124f3eb83bf0e6"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:13.088365Z","signature_b64":"ZhGoLdyq1Q6NcguQ7CUVt+wtz2rHf2SEReQ+CznklMrF5aqV2kMGXGbM6X3NM38rnZC9/VvyHNQS9Xq918GGCg==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"6171dcd918ad527a7856515268af4238ee61806a698f4fccb46dc582f104ca30","last_reissued_at":"2026-05-17T23:38:13.087758Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:13.087758Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"InstantID: Zero-shot Identity-Preserving Generation in Seconds","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"InstantID generates high-fidelity personalized images from one face photo in seconds without fine-tuning.","cross_cats":["cs.AI"],"primary_cat":"cs.CV","authors_text":"Anthony Chen, Haofan Wang, Huaxia Li, Qixun Wang, Xu Bai, Xu Tang, Yao Hu, Zekui Qin","submitted_at":"2024-01-15T07:50:18Z","abstract_excerpt":"There has been significant progress in personalized image synthesis with methods such as Textual Inversion, DreamBooth, and LoRA. Yet, their real-world applicability is hindered by high storage demands, lengthy fine-tuning processes, and the need for multiple reference images. Conversely, existing ID embedding-based methods, while requiring only a single forward inference, face challenges: they either necessitate extensive fine-tuning across numerous model parameters, lack compatibility with community pre-trained models, or fail to maintain high face fidelity. Addressing these limitations, we "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Our plug-and-play module adeptly handles image personalization in various styles using just a single facial image, while ensuring high fidelity.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the IdentityNet design, by imposing strong semantic and weak spatial conditions on facial and landmark images integrated with textual prompts, will deliver high face fidelity across styles without fine-tuning or multiple references.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"InstantID enables zero-shot identity-preserving image generation from one facial image via a novel IdentityNet that combines strong semantic and weak spatial conditioning with text prompts in diffusion models.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"InstantID generates high-fidelity personalized images from one face photo in seconds without fine-tuning.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"c7c55b2b29e48134074dbb9842a109140b90530e246b06d2351cf5ae3084f219"},"source":{"id":"2401.07519","kind":"arxiv","version":2},"verdict":{"id":"ece50b4f-89f0-4340-beb7-5021bdf36772","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T20:57:42.525913Z","strongest_claim":"Our plug-and-play module adeptly handles image personalization in various styles using just a single facial image, while ensuring high fidelity.","one_line_summary":"InstantID enables zero-shot identity-preserving image generation from one facial image via a novel IdentityNet that combines strong semantic and weak spatial conditioning with text prompts in diffusion models.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the IdentityNet design, by imposing strong semantic and weak spatial conditions on facial and landmark images integrated with textual prompts, will deliver high face fidelity across styles without fine-tuning or multiple references.","pith_extraction_headline":"InstantID generates high-fidelity personalized images from one face photo in seconds without fine-tuning."},"references":{"count":28,"sample":[{"doi":"","year":2022,"title":"eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers","work_id":"2cd7b629-ab37-4ce5-b51e-aa4d99547468","ref_index":1,"cited_arxiv_id":"2211.01324","is_internal_anchor":true},{"doi":"","year":2023,"title":"arXiv preprint arXiv:2307.09481 (2023)","work_id":"b5906af7-b3b8-4517-a1d2-9687f00960ea","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2021,"title":"In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021","work_id":"8e700a39-a905-4502-b336-4e1a69b5bf6a","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.48550/arxiv.2208.01618","year":2022,"title":"An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion","work_id":"ca618c21-3ba6-448e-bd86-bcecff3cdeb5","ref_index":4,"cited_arxiv_id":"2208.01618","is_internal_anchor":true},{"doi":"","year":2023,"title":"Designing an encoder for fast personalization of text-to-image models","work_id":"38db57b1-fb65-4b24-9152-ee4147c37fb4","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":28,"snapshot_sha256":"d825eb2dc763f997431cb7886d5e136835bba665b743088a5786fdf3f3995597","internal_anchors":7},"formal_canon":{"evidence_count":1,"snapshot_sha256":"b3b5502522b8ff5bf8e0b933470cd7db20d5632d257acc3bd0214da84b616474"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2401.07519","created_at":"2026-05-17T23:38:13.087879+00:00"},{"alias_kind":"arxiv_version","alias_value":"2401.07519v2","created_at":"2026-05-17T23:38:13.087879+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2401.07519","created_at":"2026-05-17T23:38:13.087879+00:00"},{"alias_kind":"pith_short_12","alias_value":"MFY5ZWIYVVJH","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"MFY5ZWIYVVJHU6CW","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"MFY5ZWIY","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":22,"internal_anchor_count":22,"sample":[{"citing_arxiv_id":"2605.16990","citing_title":"DreamEdit3D: Personalization of Multi-View Diffusion Models for 3D Editing","ref_index":44,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07074","citing_title":"Decoupling Semantics and Fingerprints: A Universal Representation for AI-Generated Image Detection","ref_index":33,"is_internal_anchor":true},{"citing_arxiv_id":"2509.04434","citing_title":"Durian: Dual Reference Image-Guided Portrait Animation with Attribute Transfer","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2510.20512","citing_title":"Adversarial Concept Distillation for One-Step Diffusion Personalization","ref_index":89,"is_internal_anchor":true},{"citing_arxiv_id":"2511.16136","citing_title":"How Noise Benefits AI-generated Image Detection","ref_index":64,"is_internal_anchor":true},{"citing_arxiv_id":"2512.01236","citing_title":"PSR: Scaling Multi-Subject Personalized Image Generation with Pairwise Subject-Consistency Rewards","ref_index":34,"is_internal_anchor":true},{"citing_arxiv_id":"2512.12675","citing_title":"Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling","ref_index":34,"is_internal_anchor":true},{"citing_arxiv_id":"2503.07598","citing_title":"VACE: All-in-One Video Creation and Editing","ref_index":67,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14333","citing_title":"InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation","ref_index":46,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12013","citing_title":"L2P: Unlocking Latent Potential for Pixel Generation","ref_index":20,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11927","citing_title":"RealDiffusion: Physics-informed Attention for Multi-character Storybook Generation","ref_index":36,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10302","citing_title":"Follow the Mean: Reference-Guided Flow Matching","ref_index":39,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12305","citing_title":"Images in Sentences: Scaling Interleaved Instructions for Unified Visual Generation","ref_index":36,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10302","citing_title":"Follow the Mean: Reference-Guided Flow Matching","ref_index":39,"is_internal_anchor":true},{"citing_arxiv_id":"2605.09460","citing_title":"When Few Steps Are Enough: Training-Free Acceleration of Identity-Preserved Generation","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2604.21689","citing_title":"StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition","ref_index":21,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07074","citing_title":"Decoupling Semantics and Fingerprints: A Universal Representation for AI-Generated Image Detection","ref_index":32,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07257","citing_title":"Adaptive Subspace Projection for Generative Personalization","ref_index":39,"is_internal_anchor":true},{"citing_arxiv_id":"2604.05961","citing_title":"HumANDiff: Articulated Noise Diffusion for Motion-Consistent Human Video Generation","ref_index":64,"is_internal_anchor":true},{"citing_arxiv_id":"2604.13863","citing_title":"PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios","ref_index":41,"is_internal_anchor":true},{"citing_arxiv_id":"2604.15857","citing_title":"AHS: Adaptive Head Synthesis via Synthetic Data Augmentations","ref_index":54,"is_internal_anchor":true},{"citing_arxiv_id":"2604.17195","citing_title":"DreamShot: Personalized Storyboard Synthesis with Video Diffusion Prior","ref_index":44,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":1,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/MFY5ZWIYVVJHU6CWKFJGRL2CHD","json":"https://pith.science/pith/MFY5ZWIYVVJHU6CWKFJGRL2CHD.json","graph_json":"https://pith.science/api/pith-number/MFY5ZWIYVVJHU6CWKFJGRL2CHD/graph.json","events_json":"https://pith.science/api/pith-number/MFY5ZWIYVVJHU6CWKFJGRL2CHD/events.json","paper":"https://pith.science/paper/MFY5ZWIY"},"agent_actions":{"view_html":"https://pith.science/pith/MFY5ZWIYVVJHU6CWKFJGRL2CHD","download_json":"https://pith.science/pith/MFY5ZWIYVVJHU6CWKFJGRL2CHD.json","view_paper":"https://pith.science/paper/MFY5ZWIY","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2401.07519&json=true","fetch_graph":"https://pith.science/api/pith-number/MFY5ZWIYVVJHU6CWKFJGRL2CHD/graph.json","fetch_events":"https://pith.science/api/pith-number/MFY5ZWIYVVJHU6CWKFJGRL2CHD/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/MFY5ZWIYVVJHU6CWKFJGRL2CHD/action/timestamp_anchor","attest_storage":"https://pith.science/pith/MFY5ZWIYVVJHU6CWKFJGRL2CHD/action/storage_attestation","attest_author":"https://pith.science/pith/MFY5ZWIYVVJHU6CWKFJGRL2CHD/action/author_attestation","sign_citation":"https://pith.science/pith/MFY5ZWIYVVJHU6CWKFJGRL2CHD/action/citation_signature","submit_replication":"https://pith.science/pith/MFY5ZWIYVVJHU6CWKFJGRL2CHD/action/replication_record"}},"created_at":"2026-05-17T23:38:13.087879+00:00","updated_at":"2026-05-17T23:38:13.087879+00:00"}