AIBugBench¶
Deterministic, local AI code benchmarking — sandboxed and reproducible.
Compare models on the same four tasks. Score across seven dimensions. No network. No vibes. Just receipts.
Pick your path¶
-
New users
Install, run your first benchmark, read the scorecard.
Getting Started -
Model authors
Add a model folder and wire outputs.
Developer Guide -
Power users
CLI flags, artifacts, diffs, comparisons.
User Guide -
Security first
Sandbox, audit, and guardrails at a glance.
Security
Documentation¶
Core Guides¶
- Getting Started - Setup and first benchmark run
- User Guide - Running benchmarks and interpreting results
Understanding the System¶
- Scoring Methodology - How the 7-category scoring works
- Troubleshooting - Common issues and solutions
- Sabotage Notes - Debug patterns and test hazards
Project Information¶
- Contributing - Development workflow and contribution guidelines
- Code of Conduct - Community guidelines
- Security - Security policy and reporting
- License - Apache-2.0 license information
- Roadmap - Project roadmap and future plans
- Release Notes - Version history
Developer & Internals¶
Developer Guide Architecture API Reference Internals Overview
Validation Scripts Submissions Template
Quick start¶
See Getting Started for installation and your first run.