A Very Big Video Reasoning Suite

We bet on a future that video reasoning is the next fundamental intelligence paradigm, after language reasoning, where spatiotemporal embodied world experiences could be more naturally captured.

Data Engines

VIEW ALL DATA ENGINE

hit_target_after_bounce

GitHub

Knowledge training set

Prompt

The scene shows a ball with an arrow indicating its initial direction, and several empty target positions (hollow circles) on the right side. Simulate the ball moving along this direction and bouncing off walls following the law of reflection (the angle of reflection equals the angle of incidence). The ball will follow a complete trajectory and eventually align exactly with and completely overlap one of the target positions.

First Frame

Last Frame

Video

object_subtraction

GitHub

Abstraction out-of-domain testset

Prompt

Remove all green objects from the scene. Keep all other objects unchanged.

First Frame

Last Frame

Video

locate_topmost_unobscured_figure

GitHub

Spatiality out-of-domain testset

Prompt

Multiple shapes partially overlap. Outline the topmost (unobscured) shape.

First Frame

Last Frame

Video

object_packing

GitHub

Transformation training set

Prompt

The scene shows objects on the left side and a container on the right side. Place the objects into the container one by one in the color order: pink - green - orange - purple. Each object must be placed individually in the exact order specified, and all objects must end up inside the container.

First Frame