Pith Number

pith:YAGX4P3D

pith:2026:YAGX4P3DUXM6PI4AUQPZVK3RJN

not attested not anchored not stored refs resolved

Beyond Point-Wise Matching: Structural Representation Alignment for Accelerating Diffusion Transformers

Houqiang Li, Litong Gong, Shaodong Xu, Tiezheng Ge, Wengang Zhou, Zexian Li, Zhendong Wang

Structural alignment of relational geometry in features accelerates Diffusion Transformer training and improves sample quality.

arxiv:2605.16949 v1 · 2026-05-16 · cs.CV

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{YAGX4P3DUXM6PI4AUQPZVK3RJN}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

By encouraging the model to internalize holistic spatial layouts and structural correlations from pre-trained features, sREPA achieves faster and more stable convergence, along with improved sample quality, compared to state-of-the-art alignment strategies.

C2weakest assumption

That point-wise matching objectives are insufficient to capture the rich spatial topology of visual representations and that an explicit structural constraint on relational geometry will transfer this topology more effectively.

C3one line summary

sREPA enforces structural consistency in relational geometry of pre-trained vision features to accelerate DiT training and improve generation quality.

References

46 extracted · 46 resolved · 14 Pith anchors

[1] Self-supervised learning from images with a joint-embedding predictive architecture 2023

[2] Video generation models as world simulators.OpenAI Blog, 1(8):1, 2024 2024

[3] An empirical study of training self-supervised vision transformers 2021

[4] Imagenet: A large-scale hierarchical image database 2009

[5] Diffusion models beat gans on image synthesis.Advances in neural information processing systems, 34:8780–8794 2021

Formal links

3 machine-checked theorem links

Receipt and verification

First computed	2026-05-20T00:03:32.459882Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

c00d7e3f63a5d9e7a380a41f9aab714b5a24fe0e07e8d323ffe54ad3284b5067

Aliases

arxiv: 2605.16949 · arxiv_version: 2605.16949v1 · doi: 10.48550/arxiv.2605.16949 · pith_short_12: YAGX4P3DUXM6 · pith_short_16: YAGX4P3DUXM6PI4A · pith_short_8: YAGX4P3D

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/YAGX4P3DUXM6PI4AUQPZVK3RJN \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c00d7e3f63a5d9e7a380a41f9aab714b5a24fe0e07e8d323ffe54ad3284b5067

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "d7ad830213bbfaa4f34c7ae1d28792b25fe8353fbc53f2e56c106155751bbe6b",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2026-05-16T12:01:04Z",
    "title_canon_sha256": "beb3178ecbdb85edbc2ca2e09f733f4131143ee911495293b4d2ced987e8325a"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.16949",
    "kind": "arxiv",
    "version": 1
  }
}