A Very Big Video Reasoning Suite

We bet on a future that video reasoning is the next fundamental intelligence paradigm, after language reasoning, where spatiotemporal embodied world experiences could be more naturally captured.

hit_target_after_bounce
GitHub
Knowledge training set
The scene shows a ball with an arrow indicating its initial direction, and several empty target positions (hollow circles) on the right side. Simulate the ball moving along this direction and bouncing off walls following the law of reflection (the angle of reflection equals the angle of incidence). The ball will follow a complete trajectory and eventually align exactly with and completely overlap one of the target positions.
First Frame
Last Frame
object_subtraction
GitHub
Abstraction out-of-domain testset
Remove all green objects from the scene. Keep all other objects unchanged.
First Frame
Last Frame
locate_topmost_unobscured_figure
GitHub
Spatiality out-of-domain testset
Multiple shapes partially overlap. Outline the topmost (unobscured) shape.
First Frame
Last Frame
object_packing
GitHub
Transformation training set
The scene shows objects on the left side and a container on the right side. Place the objects into the container one by one in the color order: pink - green - orange - purple. Each object must be placed individually in the exact order specified, and all objects must end up inside the container.
First Frame
Last Frame
locate_line_intersections
GitHub
Perception training set
Circle all intersection points of the line segments with red circles.
First Frame
Last Frame

Inference Results

View All Results
High Density Liquid - Samples
00
01
02
03
04
Task Domains 1/5
High Density Liquid
Knowledge out-of-domain testset
Draw Next Sized Shape
Abstraction out-of-domain testset
Directed Graph Navigation
Spatiality in-domain testset
Symbol Deletion
Transformation out-of-domain testset
Select Longest Side
Perception out-of-domain testset
Prompt
Loading...
Ground Truth
First
First Frame
Final
Final Frame
Model Outputs
1/9
VBVR-Wan2.2
VBVR-Wan2.2
CogVideoX 1.5
Kling 2.6
LTX-2
Runway Gen-4
Sora 2
Veo 3
Wan 2.2 I2V
Hunyuan I2V

Leaderboard

Reference
Strong Baseline
Proprietary
Open-source
Human
Human
97.4%
#1
VBVR
VBVR-Wan2.2
68.5%
#2
Sora 2
Sora 2
54.6%
#3
Veo 3.1
Veo 3.1
48.0%
#4
Runway
Runway Gen-4 Turbo
40.3%
#5
Wan2.2
Wan2.2-I2V-A14B
37.1%
#6
Kling
Kling 2.6
36.9%
#7
LTX-2
LTX-2
31.3%
#8
CogVideoX
CogVideoX1.5-5B-I2V
27.3%
#9
HunyuanVideo
HunyuanVideo-I2V
27.3%
#9