SAM 3D Animal is the first promptable framework for multi-animal 3D reconstruction from single images, built on SMAL+ and trained on the new Herd3D dataset, achieving SOTA results on Animal3D, APTv2, and Animal Kingdom benchmarks.
Qwen-image technical report
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6representative citing papers
KVBench reveals major gaps in current T2I models for knowledge-intensive tasks, and KE-Check narrows the gap between open- and closed-source models by adding structured knowledge and enforcing constraints.
Decouples Sphere Encoder into fixed pretrained encoder and spherical latent denoiser, yielding higher quality and faster inference than the joint original on Animal-Faces, Oxford-Flowers and ImageNet-1K.
Diverse teacher-generated rationales improve MLLM visual persuasiveness prediction via supervised fine-tuning, while a new three-dimensional faithfulness framework shows that prediction accuracy alone does not ensure faithful reasoning and that decision sensitivity best matches human preferences.
A 17B-parameter sparse MoE diffusion transformer activates 2B parameters per pass and reaches competitive quality on image generation benchmarks without post-training.
BLK-Assist is a three-part framework (Conceptor for sketches, Stencil for transparent assets, Upscale for high-res outputs) that fine-tunes public diffusion models on one artist's proprietary corpus for style-faithful generative co-creation.
citing papers explorer
-
SAM 3D Animal: Promptable Animal 3D Reconstruction from Images in the Wild
SAM 3D Animal is the first promptable framework for multi-animal 3D reconstruction from single images, built on SMAL+ and trained on the new Herd3D dataset, achieving SOTA results on Animal3D, APTv2, and Animal Kingdom benchmarks.
-
Knowledge Visualization: A Benchmark and Method for Knowledge-Intensive Text-to-Image Generation
KVBench reveals major gaps in current T2I models for knowledge-intensive tasks, and KE-Check narrows the gap between open- and closed-source models by adding structured knowledge and enforcing constraints.
-
Efficient Image Synthesis with Sphere Latent Encoder
Decouples Sphere Encoder into fixed pretrained encoder and spherical latent denoiser, yielding higher quality and faster inference than the joint original on Animal-Faces, Oxford-Flowers and ImageNet-1K.
-
Can MLLMs Reason About Visual Persuasion? Evaluating the Efficacy and Faithfulness of Reasoning
Diverse teacher-generated rationales improve MLLM visual persuasiveness prediction via supervised fine-tuning, while a new three-dimensional faithfulness framework shows that prediction accuracy alone does not ensure faithful reasoning and that decision sensitivity best matches human preferences.
-
Nucleus-Image: Sparse MoE for Image Generation
A 17B-parameter sparse MoE diffusion transformer activates 2B parameters per pass and reaches competitive quality on image generation benchmarks without post-training.
-
BLK-Assist: A Methodological Framework for Artist-Led Co-Creation with Generative AI Models
BLK-Assist is a three-part framework (Conceptor for sketches, Stencil for transparent assets, Upscale for high-res outputs) that fine-tunes public diffusion models on one artist's proprietary corpus for style-faithful generative co-creation.