A Very Big Video Reasoning Suite

We bet on a future that video reasoning is the next fundamental intelligence paradigm, after language reasoning, where spatiotemporal embodied world experiences could be more naturally captured.

mirror_reflection
GitHub
Knowledge in-domain testset
Given the mirror reflectivity = 0.31, predict the light reflection from the mirror surface. The reflected ray must extend all the way to the edge of the image.
First Frame
Last Frame
animal_color_sorting
GitHub
Abstraction training set
Colored animal faces are scattered at the top of the canvas, and containers with colored borders are at the bottom. Sort each animal into the container with the matching border color.
First Frame
Last Frame
Spatiality out-of-domain testset
The scene shows a 15×15 grid maze with dark walls and white pathways. A green circular marker indicates the starting position, and a red flag marks the end position. Starting from the green start position, navigate through the maze by moving along the white pathways. You can move to adjacent cells (up, down, left, right) but cannot pass through the dark walls. The goal is to find and demonstrate the complete path from the green start to the red flag end position, showing each step of the journey through the maze.
First Frame
Last Frame
object_packing
GitHub
Transformation training set
The scene shows objects on the left side and a container on the right side. Place the objects into the container one by one in the color order: orange - brown. Each object must be placed individually in the exact order specified, and all objects must end up inside the container.
First Frame
Last Frame
mark_right_angled_triangles
GitHub
Perception training set
The scene shows multiple triangles on a white background. Identify every triangle that has a right angle (90°) and mark each one with a red circle; each circle must enclose only that right-angled triangle, not any other shape. Show the solution step by step.
First Frame
Last Frame

Inference Results

View All Results
Domino Chain Gap Analysis - Samples
00
01
02
03
04
Task Domains 1/5
Domino Chain Gap Analysis
Knowledge in-domain testset
Shape Color Then Move
Abstraction out-of-domain testset
Grid Number Sequence
Spatiality in-domain testset
Symbol Substitute
Transformation out-of-domain testset
Majority Color
Perception in-domain testset
Prompt
Loading...
Ground Truth
First
First Frame
Final
Final Frame
Model Outputs
1/9
VBVR-Wan2.2
VBVR-Wan2.2
CogVideoX 1.5
Kling 2.6
LTX-2
Runway Gen-4
Sora 2
Veo 3
Wan 2.2 I2V
Hunyuan I2V

Leaderboard

Reference
Strong Baseline
Proprietary
Open-source
Human
Human
97.4%
#1
VBVR
VBVR-Wan2.2
68.5%
#2
Sora 2
Sora 2
54.6%
#3
Veo 3.1
Veo 3.1
48.0%
#4
Runway
Runway Gen-4 Turbo
40.3%
#5
Wan2.2
Wan2.2-I2V-A14B
37.1%
#6
Kling
Kling 2.6
36.9%
#7
LTX-2
LTX-2
31.3%
#8
CogVideoX
CogVideoX1.5-5B-I2V
27.3%
#9
HunyuanVideo
HunyuanVideo-I2V
27.3%
#9