Omniconsistency: Learning style-agnostic consistency from paired stylization data

Yiren Song, Cheng Liu, Mike Zheng Shou · 2025 · arXiv 2505.18445

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

StreamingEffect: Real-Time Human-Centric Video Effect Generation

cs.CV · 2026-05-16 · unverdicted · novelty 7.0

StreamingEffect enables real-time 720p human-centric video effect generation on one GPU via teacher-student distillation, keyframe control, and a new 130K video dataset.

ST-BiBench: Benchmarking Multi-Stream Multimodal Coordination in Bimanual Embodied Tasks for MLLMs

cs.RO · 2026-02-09 · unverdicted · novelty 7.0

ST-BiBench reveals a coordination paradox in which MLLMs show strong high-level strategic reasoning yet fail at fine-grained 16-dimensional bimanual action synthesis and multi-stream fusion.

VISTA: Triplet-Supervised Video Style Transfer with Diffusion Transformers

cs.CV · 2026-05-17 · unverdicted · novelty 6.0

VISTA introduces a new synthetic triplet dataset and diffusion-transformer framework with style adapter that jointly models style, content, and motion to achieve state-of-the-art video style transfer.

OmniHumanoid: Streaming Cross-Embodiment Video Generation with Paired-Free Adaptation

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

OmniHumanoid factorizes transferable motion learning from embodiment-specific adaptation to enable scalable cross-embodiment video generation without paired data for new humanoids.

SWEET: Sparse World Modeling with Image Editing for Embodied Task Execution

cs.CV · 2026-05-19 · unverdicted · novelty 5.0

SWEET is a one-shot sparse visual planning framework that progressively generates manipulation keyframes via image editing conditioned on language and spatial guidance, then converts them to actions with a diffusion predictor, showing better fidelity and lower cost than video models on DROID and Rob

AutoAWG: Adverse Weather Generation with Adaptive Multi-Controls for Automotive Videos

cs.CV · 2026-04-21 · unverdicted · novelty 5.0

AutoAWG generates controllable adverse weather automotive videos via semantics-guided adaptive multi-control fusion and vanishing-point-anchored temporal synthesis from static images, reducing FID by 50% and FVD by 16.1% on nuScenes without first-frame conditioning.

citing papers explorer

Showing 6 of 6 citing papers.

StreamingEffect: Real-Time Human-Centric Video Effect Generation cs.CV · 2026-05-16 · unverdicted · none · ref 62
StreamingEffect enables real-time 720p human-centric video effect generation on one GPU via teacher-student distillation, keyframe control, and a new 130K video dataset.
ST-BiBench: Benchmarking Multi-Stream Multimodal Coordination in Bimanual Embodied Tasks for MLLMs cs.RO · 2026-02-09 · unverdicted · none · ref 105
ST-BiBench reveals a coordination paradox in which MLLMs show strong high-level strategic reasoning yet fail at fine-grained 16-dimensional bimanual action synthesis and multi-stream fusion.
VISTA: Triplet-Supervised Video Style Transfer with Diffusion Transformers cs.CV · 2026-05-17 · unverdicted · none · ref 45
VISTA introduces a new synthetic triplet dataset and diffusion-transformer framework with style adapter that jointly models style, content, and motion to achieve state-of-the-art video style transfer.
OmniHumanoid: Streaming Cross-Embodiment Video Generation with Paired-Free Adaptation cs.CV · 2026-05-12 · unverdicted · none · ref 32
OmniHumanoid factorizes transferable motion learning from embodiment-specific adaptation to enable scalable cross-embodiment video generation without paired data for new humanoids.
SWEET: Sparse World Modeling with Image Editing for Embodied Task Execution cs.CV · 2026-05-19 · unverdicted · none · ref 39
SWEET is a one-shot sparse visual planning framework that progressively generates manipulation keyframes via image editing conditioned on language and spatial guidance, then converts them to actions with a diffusion predictor, showing better fidelity and lower cost than video models on DROID and Rob
AutoAWG: Adverse Weather Generation with Adaptive Multi-Controls for Automotive Videos cs.CV · 2026-04-21 · unverdicted · none · ref 33
AutoAWG generates controllable adverse weather automotive videos via semantics-guided adaptive multi-control fusion and vanishing-point-anchored temporal synthesis from static images, reducing FID by 50% and FVD by 16.1% on nuScenes without first-frame conditioning.

Omniconsistency: Learning style-agnostic consistency from paired stylization data

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer