A Very Big Video Reasoning Suite

We bet on a future that video reasoning is the next fundamental intelligence paradigm, after language reasoning, where spatiotemporal embodied world experiences could be more naturally captured.

domino_chain_gap_analysis
GitHub
Knowledge in-domain testset
Analyze the domino chain to find which domino is the last to fall. Push the first domino and watch as each domino falls and turns red. The chain will stop when it reaches a gap that is too wide. This gap will be marked "TOO FAR!" in red. The last fallen domino will be circled in green as the answer.
First Frame
Last Frame
light_sequence
GitHub
Abstraction in-domain testset
The scene shows 6 circular lights in a horizontal row on a white background. Lights on are green with glow; lights off are gray. Initially, some lights are on and some are off. Your task: Modify the light states so that all lights in the right half of the row (the 3 lights from the right side, counting from left to right) are on (green with glow), and all lights in the left half are off (gray). Turn lights on/off as needed. Lights change from gray to green (with glow) when turned on, and from green to gray (glow disappears) when turned off. Lights stay in fixed positions; only their states change.
First Frame
Last Frame
visual_jenga
GitHub
Spatiality training set
The scene shows objects stacked vertically. Extract the objects one by one from top to bottom in order, moving each object out of the frame before extracting the next one. Continue until all objects have been removed from the frame.
First Frame
Last Frame
rolling_ball
GitHub
Transformation in-domain testset
The scene shows a ball and a series of platforms arranged along a curved path. Animate the ball rolling along the trajectory path, smoothly transitioning from one platform to the next, landing on each platform in sequence, and coming to rest on the final platform.
First Frame
Last Frame
stable_sort
GitHub
Perception in-domain testset
The scene contains two types of shapes, each type has three shapes of different sizes arranged randomly. Keep all shapes unchanged in appearance (type, size, and color). Only rearrange their positions: first group the shapes by type, then within each group, sort the shapes from smallest to largest (left to right), and arrange all shapes in a single horizontal line from left to right.
First Frame
Last Frame

Inference Results

View All Results
Domino Chain Prediction - Samples
00
01
02
03
04
Task Domains 1/5
Domino Chain Prediction
Knowledge in-domain testset
Shape Color Then Move
Abstraction out-of-domain testset
Grid Avoid Obstacles
Spatiality in-domain testset
Separate Objects (No Spin)
Transformation out-of-domain testset
Understand Scene Structure
Perception in-domain testset
Prompt
Loading...
Ground Truth
First
First Frame
Final
Final Frame
Model Outputs
1/9
VBVR-Wan2.2
VBVR-Wan2.2
CogVideoX 1.5
Kling 2.6
LTX-2
Runway Gen-4
Sora 2
Veo 3
Wan 2.2 I2V
Hunyuan I2V

Leaderboard

Reference
Strong Baseline
Proprietary
Open-source
Human
Human
97.4%
#1
VBVR
VBVR-Wan2.2
68.5%
#2
Sora 2
Sora 2
54.6%
#3
Veo 3.1
Veo 3.1
48.0%
#4
Runway
Runway Gen-4 Turbo
40.3%
#5
Wan2.2
Wan2.2-I2V-A14B
37.1%
#6
Kling
Kling 2.6
36.9%
#7
LTX-2
LTX-2
31.3%
#8
CogVideoX
CogVideoX1.5-5B-I2V
27.3%
#9
HunyuanVideo
HunyuanVideo-I2V
27.3%
#9