EgoIn uses a fine-tuned vision-language model to infer transition steps and a conditioning module plus auxiliary supervision to generate coherent egocentric video sequences of object state changes.
Generative inbetweening: Adapting image-to-video models for keyframe interpolation
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
baseline 1polarities
baseline 1representative citing papers
KFC-W is a self-supervised 3D-aware video model trained on videos and multiview internet photos that produces geometrically consistent interpolations between unposed input images without any 3D annotations.
Coherence-first rendering with 15 FPS anchors plus FSR4 upsampling to 30 FPS preserves scene geometry and identity longer than native 30 FPS generation across tested forest, sword, desert, and snow scenes, with LPIPS favoring the coherence branch.
citing papers explorer
-
KFC-W: Generating 3D-Consistent Videos from Unposed Internet Photos
KFC-W is a self-supervised 3D-aware video model trained on videos and multiview internet photos that produces geometrically consistent interpolations between unposed input images without any 3D annotations.