A Very Big Video Reasoning Suite

We bet on a future that video reasoning is the next fundamental intelligence paradigm, after language reasoning, where spatiotemporal embodied world experiences could be more naturally captured.

clock
GitHub
Knowledge in-domain testset
The clock shows 6:53. Show what the clock will look like after 2 hours.
First Frame
Last Frame
symmetry_shape
GitHub
Abstraction training set
The scene shows a grid with a continuous shape on the left half. Expand the shape symmetrically to the right half by mirroring it across the vertical axis, creating a complete symmetric shape.
First Frame
Last Frame
grid_go_through_block
GitHub
Spatiality in-domain testset
The scene shows a 10x10 grid with a green start square (containing an orange circular agent), a red end square, and multiple blue, purple and pink rectangular blocks. Starting from the green start square, the agent can move to adjacent cells (up, down, left, right). The goal is to move the agent to the red end square along the shortest path that passes through all blue, purple and pink blocks (the agent must visit every blue, purple and pink block before reaching the red end square).
First Frame
Last Frame
combined_objects_spinning
GitHub
Transformation training set
The scene shows 2 objects on the left side and dashed target outlines on the right side. The dashed target outlines remain completely stationary. For each object, first rotate it in place to match the orientation of its corresponding dashed target outline, then move it horizontally to the right so that it aligns exactly with and fits within its corresponding dashed target outline.
First Frame
Last Frame
locate_intersection_of_segments
GitHub
Perception out-of-domain testset
The scene shows two black line segments that intersect at exactly one point. First locate the single intersection point where the two segments cross, then draw one red circle around that intersection point. Do not circle any endpoints and do not add any extra lines. Show the complete marking process step by step.
First Frame
Last Frame

Inference Results

View All Results
Dot to Dot - Samples
00
01
02
03
04
Task Domains 1/5
Dot to Dot
Knowledge in-domain testset
Shape Outline Then Move
Abstraction in-domain testset
Outline Innermost Square
Spatiality out-of-domain testset
Rolling Ball
Transformation in-domain testset
Color Triple Intersection
Perception out-of-domain testset
Prompt
Loading...
Ground Truth
First
First Frame
Final
Final Frame
Model Outputs
1/9
VBVR-Wan2.2
VBVR-Wan2.2
CogVideoX 1.5
Kling 2.6
LTX-2
Runway Gen-4
Sora 2
Veo 3
Wan 2.2 I2V
Hunyuan I2V

Leaderboard

Split
Category
2026-04-14 13 models