A Very Big Video Reasoning Suite

We bet on a future that video reasoning is the next fundamental intelligence paradigm, after language reasoning, where spatiotemporal embodied world experiences could be more naturally captured.

circle_central_dot
GitHub
Knowledge out-of-domain testset
A row of dots is shown. Circle the dot that is in the middle by count (the one with an equal number of dots on each side).
First Frame
Last Frame
select_next_figure_increasing_size_sequence
GitHub
Abstraction in-domain testset
The scene has two separated areas: a top SEQUENCE area and a bottom CHOICES area. In the SEQUENCE area, the shapes are the same shape and the same color, and their sizes strictly increase from left to right. First identify the constant size step between consecutive sequence shapes, then select the one correct option (out of 4) in the CHOICES area that continues the same shape, color, and size-increase pattern. Circle the correct option and show the full process step by step.
First Frame
Last Frame
LEGO_construction_assembly
GitHub
Spatiality in-domain testset
This is LEGO assembly step 4. The scene shows a partial model on the right and a 1x2 blue brick on the left in a callout box. Take the blue brick and attach it to the model at the position indicated by the red arrow. Move the brick smoothly from the callout to its destination, align it correctly, and snap it into place.
First Frame
Last Frame
add_borders_to_unbordered_shapes
GitHub
Transformation out-of-domain testset
Several shapes are shown; some have black borders and some do not. Add a thin black border to every shape that does not already have one. Do not change anything else.
First Frame
Last Frame
mark_tangent_point_of_circles
GitHub
Perception out-of-domain testset
Look at the circles on the screen. Please circle the tangent point of the two circles that are touching.
First Frame
Last Frame

Inference Results

View All Results
Communicating Vessels - Samples
00
01
02
03
04
Task Domains 1/5
Communicating Vessels
Knowledge in-domain testset
Predict Next Color
Abstraction in-domain testset
Grid Number Sequence
Spatiality in-domain testset
Move Objects to Targets
Transformation out-of-domain testset
Attention Shift (Different)
Perception in-domain testset
Prompt
Loading...
Ground Truth
First
First Frame
Final
Final Frame
Model Outputs
1/9
VBVR-Wan2.2
VBVR-Wan2.2
CogVideoX 1.5
Kling 2.6
LTX-2
Runway Gen-4
Sora 2
Veo 3
Wan 2.2 I2V
Hunyuan I2V

Leaderboard

Reference
Strong Baseline
Proprietary
Open-source
Human
Human
97.4%
#1
VBVR
VBVR-Wan2.2
68.5%
#2
Sora 2
Sora 2
54.6%
#3
Veo 3.1
Veo 3.1
48.0%
#4
Runway
Runway Gen-4 Turbo
40.3%
#5
Wan2.2
Wan2.2-I2V-A14B
37.1%
#6
Kling
Kling 2.6
36.9%
#7
LTX-2
LTX-2
31.3%
#8
CogVideoX
CogVideoX1.5-5B-I2V
27.3%
#9
HunyuanVideo
HunyuanVideo-I2V
27.3%
#9