Leaderboard
Benchmarks
AgenticVBench is the first of four benchmarks Philo Labs is building for the creative industries. v1.0 (Post-Production) is open today. The other three are held out — their tasks are authored, their rubrics are in review, and they'll release as the field is ready for them.
Live · submissions open
100 tasks
AgenticVBench v1.0 — Post-Production
100 expert-authored tasks across Assembly, Repair, Sequencing, Repurpose.
May 2026 · open
Held out · future release
100 tasks
End-to-end film generation
Brief → finished short. Cinematic intent, story coherence, technical delivery — graded by directors.
Held out · future release
Held out · future release
100 tasks
End-to-end game creation
Pitch → playable build. Mechanics, level design, agency — graded by indie game designers.
Held out · future release
Held out · future release
100 tasks
Audio-first experience
Brief → finished sonic piece. Composition, mix, dramatic arc — graded by sound designers and composers.
Held out · future release
Why held out? Releasing all four at once dilutes the signal each benchmark is meant to send. Each future release will go through the same expert authoring, rubric review, and inter-rater calibration as v1.0 — and ship when the verification is defensible, not on a marketing calendar.