Roadmap

Future releases.

AgenticVBench v1.0 (Post-Production) is the first of four benchmarks Philo Labs is building for the creative industries. The other three are in development, with tasks authored and rubrics in review, shipping as the field is ready for them.

Future release

End-to-end film generation

Brief → finished short. Cinematic intent, story coherence, technical delivery, graded by directors.

Future release

End-to-end game creation

Pitch → playable build. Mechanics, level design, agency, graded by indie game designers.

Future release

Audio-first experience

Brief → finished sonic piece. Composition, mix, dramatic arc, graded by sound designers and composers.

Why stage them? Releasing all four at once dilutes the signal each benchmark is meant to send. Each future release will go through the same expert authoring, rubric review, and inter-rater calibration as v1.0, and ship when the verification is defensible.

← Back to v1.0 leaderboard