We evaluated FVD and JEDi video distribution distances across 5 training checkpoints while fine-tuning Stable Video Diffusion on the BDD dataset. JEDi tracks incremental gains in all checkpoints, whereas FVD detects monumental gains early. Visually, video quality continues to increase as fine-tuning progresses, but this is not observed when tracking performance using FVD.
Stable Video Diffusion Fine-Tuning Dynamics: JEDi vs FVD
For reference, we show sample videos from the BDD dataset at different fine-tuning iterations. The videos show the progression of the fine-tuning process, with the quality of the generated videos improving over time.
| Iteration 0 | Iteration 1 | Iteration 6 | Iteration 1200 |