Run the Main Benchmark
Command
python scripts/run_manuscript_benchmark.pyBy default the script uses:
data/manuscript/manuscript_primary.csvdata/zenodo/standard_tool_predictions/results/benchmark_runs/manuscript_primary/
Explicit invocation
python scripts/run_manuscript_benchmark.py --truth-path data/manuscript/manuscript_primary.csv --prediction-dir data/zenodo/standard_tool_predictions --output-dir results/benchmark_runs/manuscript_primaryExecution path
run_manuscript_benchmark.py is a thin wrapper around the public package. The call sequence is:
scripts/run_manuscript_benchmark.pyofftarget_benchmark.cliofftarget_benchmark.benchmark_runner
The runner performs the following steps.
- Load and normalize the truth table.
- Load the stored prediction contracts.
- Normalize guide, site, chromosome, score, and rank fields.
- Match predicted sites against benchmark sites.
- Compute guide level and overall metrics.
- Write the benchmark result tables, quality-control table, and compact report.
Output tables
benchmark_by_guide.csv
One row per tool and guide. This table stores the guide level counts and recall metrics used to summarize tool performance across guides.
benchmark_matched_truth_long.csv
Long format matched site table. Each row corresponds to a benchmark site that was matched to a tool prediction. It retains truth label, score, rank, and observed editing signal.
benchmark_overall.csv
One row per tool. This table reports the overall benchmark summaries derived from the guide level metrics, including top-k recovery, k90, k95, and related aggregate quantities.
benchmark_overall_recall_curves.csv
Recall at increasing rank cutoffs for each tool. Figure 6 uses this table to plot how validated true-site recovery changes as k increases.
benchmark_pr_curves.csv
Precision recall curve coordinates written for each tool. These are the points used to reconstruct the full curves in Figure 3.
benchmark_pr_summary.csv
Per tool summary of the precision recall analysis. This table contains average precision and related scalar summaries.
benchmark_pairwise_tests.csv
Pairwise statistical comparison table derived from the guide level summaries. This is the compact benchmark output used for pairwise tool comparisons.
benchmark_budget_constrained_pairwise_recall.csv
Pairwise combination table under reciprocal rank fusion and fixed shortlist sizes. This table is retained as a benchmark output for supplementary combination analyses.
These tables supply the figure notebooks under docs/figures/. The notebooks render the manuscript figures directly from the benchmark-run outputs where possible. Figure 5 additionally reads the staged prediction contracts because it measures coverage before rank thresholds are applied. The ML panel in Figure 6 uses the same recall-curve schema staged under results/benchmark_runs/no_bulge_ml_comparison/. Figure 7 uses benchmark_budget_constrained_pairwise_recall.csv under the no-bulge ML comparison run.