A Very Big Video Reasoning Suite

We bet on a future that video reasoning is the next fundamental intelligence paradigm, after language reasoning, where spatiotemporal embodied world experiences could be more naturally captured.

Data Engines

View All
circle_maximum_value
GitHub
Knowledge training set
There are multiple numbers on the screen, circle the one with the largest value
First Frame
Last Frame
maintain_object_identity_different_objects
GitHub
Abstraction training set
The left object is blue and the right object is dark blue. The scene shows two objects, one on the left and one on the right. Swap the positions of the left and right objects. After the swap, draw an arrow below the object that was originally on the left, pointing up at it.
First Frame
Last Frame
key_door_matching
GitHub
Spatiality in-domain testset
The scene shows a maze with a green circular agent, colored diamond-shaped keys, and colored hollow rectangular doors. Find the Blue key and then navigate to the matching Blue door, showing the complete movement process step by step.
First Frame
Last Frame
symbol_substitute
GitHub
Transformation out-of-domain testset
Substitute ◯ at position 1 with a orange ◇. The animation shows the old symbol fading out completely, then the new symbol gradually fading in at the same position.
First Frame
Last Frame
multi_object_placement
GitHub
Perception in-domain testset
The scene contains multiple colored objects and star markers. Keep all star markers unchanged in position. Move each colored object to the star marker with the same color using straight paths, aligning the center of each object with the center of its matching star marker.
First Frame
Last Frame

Inference Results

View Full Bench
Domino Chain Prediction - Samples
00
01
02
03
04
Task Domains 1/5
Domino Chain Prediction
Knowledge in-domain testset
Shape Outline Fill
Abstraction in-domain testset
Select Leftmost Shape
Spatiality out-of-domain testset
Symbol Edit
Transformation out-of-domain testset
Identify Pentagons
Perception out-of-domain testset
Prompt
Loading...
Ground Truth
First
First Frame
Final
Final Frame
Model Outputs
1/
VBVR-Wan2.2
VBVR-Wan2.2
CogVideoX 1.5
Kling 2.6
LTX-2
Runway Gen-4
Sora 2
Veo 3
Wan 2.2 I2V
Hunyuan I2V
Seedance 2.0

Leaderboard

Modality
Split
Type
Category
2026-04-28