Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Alexander Varlamov; Alexey Letunovskiy; Anastasia Maltseva; Anastasiia Kargapoltseva; Andrey Shutkin; Anna Averchenkova; Anna Dmitrienko; Denis Dimitrov; Denis Koposov; Denis Parkhomenko

arxiv: 2511.14993 · v3 · pith:Z5FPOFRInew · submitted 2025-11-19 · 💻 cs.CV · cs.AI· cs.LG

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Vladimir Arkhipkin , Vladimir Korviakov , Nikolai Gerasimenko , Denis Parkhomenko , Viacheslav Vasilev , Alexey Letunovskiy , Nikolai Vaulin , Maria Kovaleva

show 17 more authors

Ivan Kirillov Lev Novitskiy Denis Koposov Nikita Kiselev Alexander Varlamov Dmitrii Mikhailov Vladimir Polovnikov Andrey Shutkin Julia Agafonova Ilya Vasiliev Anastasiia Kargapoltseva Anna Dmitrienko Anastasia Maltseva Anna Averchenkova Olga Kim Tatiana Nikulina Denis Dimitrov

This is my paper

classification 💻 cs.CV cs.AIcs.LG

keywords kandinskymodelsvideogenerationimagegenerativeparametertraining

0 comments

read the original abstract

This report introduces Kandinsky 5.0, a family of state-of-the-art foundation models for high-resolution image and 10-second video synthesis. The framework comprises three core line-up of models: Kandinsky 5.0 Image Lite - a line-up of 6B parameter image generation models, Kandinsky 5.0 Video Lite - a fast and lightweight 2B parameter text-to-video and image-to-video models, and Kandinsky 5.0 Video Pro - 19B parameter models that achieves superior video generation quality. We provide a comprehensive review of the data curation lifecycle - including collection, processing, filtering and clustering - for the multi-stage training pipeline that involves extensive pre-training and incorporates quality-enhancement techniques such as self-supervised fine-tuning (SFT) and reinforcement learning (RL)-based post-training. We also present novel architectural, training, and inference optimizations that enable Kandinsky 5.0 to achieve high generation speeds and state-of-the-art performance across various tasks, as demonstrated by human evaluation. As a large-scale, publicly available generative framework, Kandinsky 5.0 leverages the full potential of its pre-training and subsequent stages to be adapted for a wide range of generative applications. We hope that this report, together with the release of our open-source code and training checkpoints, will substantially advance the development and accessibility of high-quality generative models for the research community.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

OP-GRPO: Efficient Off-Policy GRPO for Flow-Matching Models
cs.CV 2026-04 unverdicted novelty 8.0

OP-GRPO is the first off-policy GRPO method for flow-matching models that reuses trajectories via replay buffer and importance sampling corrections, matching on-policy performance with 34.2% of the training steps.
HumanScore: Benchmarking Human Motions in Generated Videos
cs.CV 2026-04 unverdicted novelty 7.0

HumanScore defines six metrics for kinematic plausibility, temporal stability, and biomechanical consistency to benchmark human motions in videos from thirteen state-of-the-art generation models, revealing gaps betwee...
AtlasVid: Efficient Ultra-High-Resolution Long Video Generation via Decoupled Global-Local Modeling
cs.CV 2026-05 unverdicted novelty 6.0

AtlasVid proposes a decoupled global-local diffusion framework that trains at low resolution with LoRA and generalizes to ultra-high-resolution long video synthesis via semantic proxy guidance and locality-preserving ...