pith. sign in
Pith Number

pith:2D4W3XNJ

pith:2026:2D4W3XNJE7NV6KSGR5NXZB43WM
not attested not anchored not stored refs resolved

HighSync: High-Quality Lip Synchronization via Latent Diffusion Models

Majid Iranpour Mobarekeh, Mehdi Bagheri, Mostafa Alavi, Saeed Firouzi Daghigh

HighSync generates photorealistic lip-synced videos at 512x512 by removing data leakage that blocked genuine audio dependence.

arxiv:2605.16918 v1 · 2026-05-16 · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{2D4W3XNJE7NV6KSGR5NXZB43WM}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

HighSync addresses both challenges simultaneously and, to our knowledge, is the first lip sync model to operate natively at 512*512 resolution, positioning it as a viable solution for professional production environments such as the film and broadcast industries.

C2weakest assumption

The central premise that the identified data leakage phenomenon was silently undermining temporal modeling in all prior work and that its systematic elimination directly produces genuine audio dependence without introducing new artifacts or requiring other unstated modeling changes.

C3one line summary

HighSync is a diffusion-based lip synchronization system that operates natively at 512x512 resolution by eliminating data leakage to enforce genuine audio dependence and reports state-of-the-art results on quality and sync metrics.

References

28 extracted · 28 resolved · 5 Pith anchors

[1] A lip sync expert is all you need for speech to lip generation in the wild, 2020
[2] Diff2Lip: Audio conditioned diffusion models for lip-synchronization, 2024
[3] Yuet al., “Make your actor talk: Generalizable and high-fidelity 11 Fig 2024
[4] Latentsync: Taming audio-conditioned latent diffusion models for lip sync with syncnet supervision 2024
[5] MuseTalk: Real-time high-fidelity video dubbing via spatio-temporal sampling, 2024

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-20T00:03:30.399374Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

d0f96ddda927db5f2a468f5b7c879bb308a9ddb8dcc468f4ac190f01d0edc690

Aliases

arxiv: 2605.16918 · arxiv_version: 2605.16918v1 · doi: 10.48550/arxiv.2605.16918 · pith_short_12: 2D4W3XNJE7NV · pith_short_16: 2D4W3XNJE7NV6KSG · pith_short_8: 2D4W3XNJ
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/2D4W3XNJE7NV6KSGR5NXZB43WM \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d0f96ddda927db5f2a468f5b7c879bb308a9ddb8dcc468f4ac190f01d0edc690
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "a87b49862d413237c930b251a7ff46f7969792feda0ecc4759f413fede0ad3eb",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2026-05-16T10:20:52Z",
    "title_canon_sha256": "5936ecc3e2a5f811e6b6da38c11863212823a7e903bc3f0b4e1e4ebad05a0e73"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.16918",
    "kind": "arxiv",
    "version": 1
  }
}