Roadmap
Future releases.
AgenticVBench v1.0 (Post-Production) is the first of four benchmarks Philo Labs is building for the creative industries. The other three are in development, with tasks authored and rubrics in review, shipping as the field is ready for them.
Future release
End-to-end film generation
Brief → finished short. Cinematic intent, story coherence, technical delivery, graded by directors.
Future release
End-to-end game creation
Pitch → playable build. Mechanics, level design, agency, graded by indie game designers.
Future release
Audio-first experience
Brief → finished sonic piece. Composition, mix, dramatic arc, graded by sound designers and composers.
Why stage them? Releasing all four at once dilutes the signal each benchmark is meant to send. Each future release will go through the same expert authoring, rubric review, and inter-rater calibration as v1.0, and ship when the verification is defensible.