Skip to content

AIBugBench¶

Deterministic, local AI code benchmarking — sandboxed and reproducible.

Compare models on the same four tasks. Score across seven dimensions. No network. No vibes. Just receipts.

Get started How scoring works

Pick your path¶

New users
Install, run your first benchmark, read the scorecard.
Getting Started
Model authors
Add a model folder and wire outputs.
Developer Guide
Power users
CLI flags, artifacts, diffs, comparisons.
User Guide
Security first
Sandbox, audit, and guardrails at a glance.
Security

Documentation¶

Core Guides¶

Getting Started - Setup and first benchmark run
User Guide - Running benchmarks and interpreting results

Understanding the System¶

Scoring Methodology - How the 7-category scoring works
Troubleshooting - Common issues and solutions
Sabotage Notes - Debug patterns and test hazards

Project Information¶

Contributing - Development workflow and contribution guidelines
Code of Conduct - Community guidelines
Security - Security policy and reporting
License - Apache-2.0 license information
Roadmap - Project roadmap and future plans
Release Notes - Version history

Developer & Internals¶

Developer Guide Architecture API Reference Internals Overview

Validation Scripts Submissions Template

Quick start¶

See Getting Started for installation and your first run.