Stable Video Diffusion Fine-Tuning Dynamics: JEDi vs FVD

We evaluated FVD and JEDi video distribution distances across 5 training checkpoints while fine-tuning Stable Video Diffusion on the BDD dataset. JEDi tracks incremental gains in all checkpoints, whereas FVD detects monumental gains early. Visually, video quality continues to increase as fine-tuning progresses, but this is not observed when tracking performance using FVD.

Fine-tuning Progression

Stable Video Diffusion Fine-Tuning Dynamics: JEDi vs FVD

For reference, we show sample videos from the BDD dataset at different fine-tuning iterations. The videos show the progression of the fine-tuning process, with the quality of the generated videos improving over time.

Iteration 0 Iteration 1 Iteration 6 Iteration 1200