pith. sign in
Pith Number

pith:MFY5ZWIY

pith:2024:MFY5ZWIYVVJHU6CWKFJGRL2CHD
not attested not anchored not stored refs resolved

InstantID: Zero-shot Identity-Preserving Generation in Seconds

Anthony Chen, Haofan Wang, Huaxia Li, Qixun Wang, Xu Bai, Xu Tang, Yao Hu, Zekui Qin

InstantID generates high-fidelity personalized images from one face photo in seconds without fine-tuning.

arxiv:2401.07519 v2 · 2024-01-15 · cs.CV · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{MFY5ZWIYVVJHU6CWKFJGRL2CHD}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our plug-and-play module adeptly handles image personalization in various styles using just a single facial image, while ensuring high fidelity.

C2weakest assumption

That the IdentityNet design, by imposing strong semantic and weak spatial conditions on facial and landmark images integrated with textual prompts, will deliver high face fidelity across styles without fine-tuning or multiple references.

C3one line summary

InstantID enables zero-shot identity-preserving image generation from one facial image via a novel IdentityNet that combines strong semantic and weak spatial conditioning with text prompts in diffusion models.

References

28 extracted · 28 resolved · 7 Pith anchors

[1] eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers 2022 · arXiv:2211.01324
[2] arXiv preprint arXiv:2307.09481 (2023) 2023
[3] In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 2021
[4] An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion 2022 · doi:10.48550/arxiv.2208.01618
[5] Designing an encoder for fast personalization of text-to-image models 2023

Formal links

1 machine-checked theorem link

Cited by

22 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:13.087758Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

6171dcd918ad527a7856515268af4238ee61806a698f4fccb46dc582f104ca30

Aliases

arxiv: 2401.07519 · arxiv_version: 2401.07519v2 · doi: 10.48550/arxiv.2401.07519 · pith_short_12: MFY5ZWIYVVJH · pith_short_16: MFY5ZWIYVVJHU6CW · pith_short_8: MFY5ZWIY
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/MFY5ZWIYVVJHU6CWKFJGRL2CHD \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 6171dcd918ad527a7856515268af4238ee61806a698f4fccb46dc582f104ca30
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "1453585afe60e10c953c7cf24965f2a5e62be976a2aca5a311124f3eb83bf0e6",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2024-01-15T07:50:18Z",
    "title_canon_sha256": "8d6ea9a5cfb31dd659312284b07e1456c05cee79909eb5f36658bebbd67e5318"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2401.07519",
    "kind": "arxiv",
    "version": 2
  }
}