TaleDiffusion introduces an iterative framework using LLM-generated per-frame descriptions, bounded attention-based per-box masks, identity-consistent self-attention, region-aware cross-attention, and CLIPSeg-based dialogue rendering to produce consistent multi-character story visualizations.
Synartifact: Classifying and alleviat- ing artifacts in synthetic images via vision-language model
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2025 2verdicts
UNVERDICTED 2representative citing papers
Ivy-Fake delivers a multimodal explainable benchmark for AIGC detection and an RL model that raises GenImage accuracy from 86.88% to 96.32%.
citing papers explorer
-
TaleDiffusion: Multi-Character Story Generation with Dialogue Rendering
TaleDiffusion introduces an iterative framework using LLM-generated per-frame descriptions, bounded attention-based per-box masks, identity-consistent self-attention, region-aware cross-attention, and CLIPSeg-based dialogue rendering to produce consistent multi-character story visualizations.
-
Ivy-Fake: A Unified Explainable Framework and Benchmark for Image and Video AIGC Detection
Ivy-Fake delivers a multimodal explainable benchmark for AIGC detection and an RL model that raises GenImage accuracy from 86.88% to 96.32%.