Run the Main Benchmark

Command

python scripts/run_manuscript_benchmark.py

By default the script uses:

  • data/manuscript/manuscript_primary.csv
  • data/zenodo/standard_tool_predictions/
  • results/benchmark_runs/manuscript_primary/

Explicit invocation

python scripts/run_manuscript_benchmark.py   --truth-path data/manuscript/manuscript_primary.csv   --prediction-dir data/zenodo/standard_tool_predictions   --output-dir results/benchmark_runs/manuscript_primary

Execution path

run_manuscript_benchmark.py is a thin wrapper around the public package. The call sequence is:

  • scripts/run_manuscript_benchmark.py
  • offtarget_benchmark.cli
  • offtarget_benchmark.benchmark_runner

The runner performs the following steps.

  1. Load and normalize the truth table.
  2. Load the stored prediction contracts.
  3. Normalize guide, site, chromosome, score, and rank fields.
  4. Match predicted sites against benchmark sites.
  5. Compute guide level and overall metrics.
  6. Write the benchmark result tables, quality-control table, and compact report.

Output tables

benchmark_by_guide.csv

One row per tool and guide. This table stores the guide level counts and recall metrics used to summarize tool performance across guides.

benchmark_matched_truth_long.csv

Long format matched site table. Each row corresponds to a benchmark site that was matched to a tool prediction. It retains truth label, score, rank, and observed editing signal.

benchmark_overall.csv

One row per tool. This table reports the overall benchmark summaries derived from the guide level metrics, including top-k recovery, k90, k95, and related aggregate quantities.

benchmark_overall_recall_curves.csv

Recall at increasing rank cutoffs for each tool. Figure 6 uses this table to plot how validated true-site recovery changes as k increases.

benchmark_pr_curves.csv

Precision recall curve coordinates written for each tool. These are the points used to reconstruct the full curves in Figure 3.

benchmark_pr_summary.csv

Per tool summary of the precision recall analysis. This table contains average precision and related scalar summaries.

benchmark_pairwise_tests.csv

Pairwise statistical comparison table derived from the guide level summaries. This is the compact benchmark output used for pairwise tool comparisons.

benchmark_budget_constrained_pairwise_recall.csv

Pairwise combination table under reciprocal rank fusion and fixed shortlist sizes. This table is retained as a benchmark output for supplementary combination analyses.

These tables supply the figure notebooks under docs/figures/. The notebooks render the manuscript figures directly from the benchmark-run outputs where possible. Figure 5 additionally reads the staged prediction contracts because it measures coverage before rank thresholds are applied. The ML panel in Figure 6 uses the same recall-curve schema staged under results/benchmark_runs/no_bulge_ml_comparison/. Figure 7 uses benchmark_budget_constrained_pairwise_recall.csv under the no-bulge ML comparison run.