Run the Main Benchmark

Command

python scripts/run_manuscript_benchmark.py

By default the script uses:

data/manuscript/manuscript_primary.csv
data/zenodo/standard_tool_predictions/
results/benchmark_runs/manuscript_primary/

Explicit invocation

python scripts/run_manuscript_benchmark.py   --truth-path data/manuscript/manuscript_primary.csv   --prediction-dir data/zenodo/standard_tool_predictions   --output-dir results/benchmark_runs/manuscript_primary

Execution path

run_manuscript_benchmark.py is a thin wrapper around the public package. The call sequence is:

scripts/run_manuscript_benchmark.py
offtarget_benchmark.cli
offtarget_benchmark.benchmark_runner

The runner performs the following steps.

Load and normalize the truth table.
Load the stored prediction contracts.
Normalize guide, site, chromosome, score, and rank fields.
Match predicted sites against benchmark sites.
Compute guide level and overall metrics.
Write the benchmark result tables, quality-control table, and compact report.

Output tables

`benchmark_by_guide.csv`

One row per tool and guide. This table stores the guide level counts and recall metrics used to summarize tool performance across guides.

`benchmark_matched_truth_long.csv`

Long format matched site table. Each row corresponds to a benchmark site that was matched to a tool prediction. It retains truth label, score, rank, and observed editing signal.

`benchmark_overall.csv`

One row per tool. This table reports the overall benchmark summaries derived from the guide level metrics, including top-k recovery, k90, k95, and related aggregate quantities.

`benchmark_overall_recall_curves.csv`

Recall at increasing rank cutoffs for each tool. Figure 6 uses this table to plot how validated true-site recovery changes as k increases.

`benchmark_pr_curves.csv`

Precision recall curve coordinates written for each tool. These are the points used to reconstruct the full curves in Figure 3.

`benchmark_pr_summary.csv`

Per tool summary of the precision recall analysis. This table contains average precision and related scalar summaries.

`benchmark_pairwise_tests.csv`

Pairwise statistical comparison table derived from the guide level summaries. This is the compact benchmark output used for pairwise tool comparisons.

`benchmark_budget_constrained_pairwise_recall.csv`

Pairwise combination table under reciprocal rank fusion and fixed shortlist sizes. This table is retained as a benchmark output for supplementary combination analyses.

These tables supply the figure notebooks under docs/figures/. The notebooks render the manuscript figures directly from the benchmark-run outputs where possible. Figure 5 additionally reads the staged prediction contracts because it measures coverage before rank thresholds are applied. The ML panel in Figure 6 uses the same recall-curve schema staged under results/benchmark_runs/no_bulge_ml_comparison/. Figure 7 uses benchmark_budget_constrained_pairwise_recall.csv under the no-bulge ML comparison run.