EM-Vid introduces an entity-centric latent patch memory bank with sparse token conditioning and budgeted updates for training-free consistent multi-shot video generation.
Wang, Siming Fu, Qihan Huang, Wanggui He, and Hao Jiang
10 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
ASTRA disentangles subject identity from pose structure in diffusion transformers via retrieval-augmented pose guidance, asymmetric EURoPE embeddings, and a DSM adapter to improve multi-subject generation.
DSH-Bench is a benchmark for subject-driven T2I generation that uses hierarchical taxonomy sampling, difficulty/scenario classification, and a new SICS metric showing 9.4% higher human correlation than prior measures.
LooseRoPE modulates RoPE in diffusion attention maps to continuously trade off between preserving a pasted object's identity and harmonizing it with its new surroundings.
T2I-FactualBench is a new three-tier benchmark for factuality of knowledge-intensive concepts in T2I models, using multi-round VQA evaluation to show SOTA models need improvement.
Presents MRT, a 20B-parameter masked region diffusion model unifying text-to-layers, image-to-layers, and layers-to-layers tasks with an overflow-aware canvas layer for complete editable outputs.
A zero-shot subject-driven video generation framework that decomposes the task into identity injection from 200K subject-image pairs and motion preservation from 4K arbitrary videos, trained in 288 A100 GPU hours on CogVideoX-5B to match prior performance at 1% compute.
FreeGraftor performs subject-driven text-to-image generation without training by cross-image feature grafting via semantic matching, position-constrained attention fusion, and a noise initialization strategy that preserves reference geometry.
A data-generation pipeline plus pairwise subject-consistency rewards in RL improve consistency and prompt adherence for multi-subject personalized image generation.
AnimalBooth introduces an Animal Net, adaptive attention module, and frequency-controlled DCT feature integration to improve identity preservation and perceptual quality in personalized animal image generation, supported by a new high-resolution dataset AnimalBench.
citing papers explorer
-
T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts
T2I-FactualBench is a new three-tier benchmark for factuality of knowledge-intensive concepts in T2I models, using multi-round VQA evaluation to show SOTA models need improvement.