Leaderboard

Benchmarks

AgenticVBench is the first of four benchmarks Philo Labs is building for the creative industries. v1.0 (Post-Production) is open today. The other three are held out — their tasks are authored, their rubrics are in review, and they'll release as the field is ready for them.

Why held out? Releasing all four at once dilutes the signal each benchmark is meant to send. Each future release will go through the same expert authoring, rubric review, and inter-rater calibration as v1.0 — and ship when the verification is defensible, not on a marketing calendar.