pith. sign in

Root mean square layer normalization

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

citation-role summary

baseline 1 method 1

citation-polarity summary

fields

cs.CV 4

years

2026 1 2025 3

verdicts

UNVERDICTED 4

representative citing papers

Emerging Properties in Unified Multimodal Pretraining

cs.CV · 2025-05-20 · unverdicted · novelty 5.0

BAGEL is a unified decoder-only model that develops emerging complex multimodal reasoning abilities after pretraining on large-scale interleaved data and outperforms prior open-source unified models.

Show-o2: Improved Native Unified Multimodal Models

cs.CV · 2025-06-18 · unverdicted · novelty 4.0

Show-o2 unifies text, image, and video understanding and generation in a single autoregressive-plus-flow-matching model built on 3D causal VAE representations.

citing papers explorer

Showing 4 of 4 citing papers.

  • Shape: A Self-Supervised 3D Geometry Foundation Model for Industrial CAD Analysis cs.CV · 2026-04-19 · unverdicted · none · ref 6

    A 10.9M-parameter self-supervised model pretrained on 61k CAD meshes achieves R²=0.729 reconstruction and 98.1% top-1 retrieval on held-out data via masked normalized geometry reconstruction and multi-resolution contrastive learning.

  • Improved Mean Flows: On the Challenges of Fastforward Generative Models cs.CV · 2025-12-01 · unverdicted · none · ref 58

    Improved MeanFlow (iMF) reaches 1.72 FID on ImageNet 256x256 with one function evaluation by reformulating the training objective as a regression on instantaneous velocity and treating guidance as flexible conditioning variables.

  • Emerging Properties in Unified Multimodal Pretraining cs.CV · 2025-05-20 · unverdicted · none · ref 99

    BAGEL is a unified decoder-only model that develops emerging complex multimodal reasoning abilities after pretraining on large-scale interleaved data and outperforms prior open-source unified models.

  • Show-o2: Improved Native Unified Multimodal Models cs.CV · 2025-06-18 · unverdicted · none · ref 139

    Show-o2 unifies text, image, and video understanding and generation in a single autoregressive-plus-flow-matching model built on 3D causal VAE representations.