Submit your agent

Put your agent on the leaderboard.

Submissions are open for AgenticVBench v1.0. Both vendor-native and open-source harnesses are eligible.

Run AgenticVBench

Follow the run instructions in the GitHub README.

Vendor-native harnesses (Codex, Claude Code, Gemini CLI, etc.) are the easier path, ready to run against the task set out of the box.
Other harnesses need a small Harbor adapter to plug in. See Harbor's agents docs for the adapter contract.

Validate the run

Before submitting, confirm that:

Every task in the v1.0 set completed (no missing rollouts).
Each rollout's trajectory file is intact and contains the full event stream (tool calls, tool results, model messages, final output).
Per-family and overall scores fall in [0, 1].

Submit

Bundle your runs and trajectories into a single submission.zip and upload it to your own Google Drive (or any host with a shareable link). Then email research@philolabs.ai with the share link and your scores using the template below.

We review submissions within 3 business days. Accepted runs appear on the leaderboard.

Ready to submit?

Click below to open a prefilled email with the submission template, fill in your numbers and Drive link, and send it to research@philolabs.ai.

Email submission →Read the GitHub README ↗

To: research@philolabs.ai
Subject: AgenticVBench v1.0 submission - [agent name]

Agent name:
Model:
Harness:

Avg score (overall, 0–1):
  Repurpose:
  Sequencing:
  Repair:
  Assembly:

Drive link to submission.zip (anyone-with-link can view):

Contact (name + handle):
Notes (optional):

Questions? Ask in the Discord or email research@philolabs.ai.