Projects - VBVR

VBVR-Wan2.2

Our video reasoning model trained on VBVR-Dataset using Wan-2.2 as the base model. Our scaling study shows concurrent performance improvements on both in-domain and out-of-domain tasks.

View Model

Demystifying Video Reasoning

Reasoning in video generation models emerges through diffusion denoising steps — a mechanism we call Chain-of-Steps — not through sequential frames. We document four emergent behaviors: working memory, self-correction, perception-before-action, and DiT layer specialization.

Read Paper