The public rerun begins from stored prediction contracts. Those files are produced upstream from tool specific outputs and then evaluated in a common benchmark schema. This page records the tool roster, the configured commands, and the location of the upstream setup notes.
Manifest and registry
Code
from pathlib import Pathimport pandas as pdimport yamlmanifest_path = Path('../config/tool_output_manifest.example.yml')registry_path = Path('../config/prediction_tools.yaml')manifest = yaml.safe_load(manifest_path.read_text(encoding='utf-8'))registry = yaml.safe_load(registry_path.read_text(encoding='utf-8'))manifest_df = pd.DataFrame(manifest['tool_outputs'])registry_df = pd.DataFrame.from_dict(registry['tools'], orient='index').reset_index(names='tool_slug')def summarize_command(value):if value in (None, '', []):return'not used in the public rerun'ifisinstance(value, list):return' '.join(str(part) for part in value)returnstr(value)def classify_role(value): mapping = {'search_engine': 'native genome search','pair_scorer': 'pair scorer','web_service_search': 'web service search', }return mapping.get(str(value), str(value))tool_table = manifest_df.merge(registry_df, on='tool_slug', how='left')tool_table['benchmark_role'] = tool_table['tool_role'].map(classify_role)tool_table['local_command_summary'] = tool_table['local_command'].map(summarize_command)tool_table['docker_command_summary'] = tool_table['docker_command'].map(summarize_command)tool_table[ ['tool','tool_slug','mode','benchmark_role','relative_path','local_command_summary','docker_command_summary', ]]
tool
tool_slug
mode
benchmark_role
relative_path
local_command_summary
docker_command_summary
0
Cas-OFFinder
cas_offinder
native_search
native genome search
data/zenodo/standard_tool_predictions/predicti...
cas-offinder
snugel/cas-offinder:latest cas-offinder
1
CRISPRitz_mismatch
crispritz_mismatch
native_search
native genome search
data/zenodo/standard_tool_predictions/predicti...
crispritz.py
pinellolab/crispritz:latest crispritz.py
2
CRISPRitz_cfd
crispritz_cfd
native_search
native genome search
data/zenodo/standard_tool_predictions/predicti...
crispritz.py
pinellolab/crispritz:latest crispritz.py
3
FlashFry
flashfry
native_search
native genome search
data/zenodo/standard_tool_predictions/predicti...
flashfry
eclipse-temurin:8-jre java -Xmx8g -jar FlashFr...
4
GuideScan2
guidescan2
native_search
native genome search
data/zenodo/standard_tool_predictions/predicti...
guidescan
nan
5
CRISPROFF
crisproff
pair_scorer
pair scorer
data/zenodo/standard_tool_predictions/predicti...
run_crisproff.py
nan
6
CCTop
cctop
native_search
web service search
data/zenodo/standard_tool_predictions/predicti...
cctop_submit.py
nan
7
CRISPOR
crispor
native_search
native genome search
data/zenodo/standard_tool_predictions/predicti...
nan
maximilianh/crispor:latest
8
MOFF
moff
pair_scorer
pair scorer
data/zenodo/standard_tool_predictions/predicti...
MOFF score
nan
9
CRISOT
crisot
pair_scorer
pair scorer
data/zenodo/standard_tool_predictions/predicti...
CRISOT.py scores
nan
The manifest defines the public contract files required by the rerun. The tool registry defines the upstream commands and runtime assumptions used to create those contract files.
Contract files
Each row in the manifest corresponds to one public contract file.
tool is the manuscript facing tool name.
tool_slug is the configuration key.
mode distinguishes native search tools from pair scorers.
relative_path is the expected location of the standardized contract file.
The contract layer is the public handoff between upstream tool execution and the benchmark rerun.
The two command columns summarize how each tool was invoked upstream. They are not full shell transcripts, but they identify the program, wrapper, or container used to generate the normalized contract file.
Upstream provenance
The upstream execution and normalization logic is documented in:
config/prediction_tools.yaml
config/tool_output_manifest.example.yml
data/zenodo/README.md
prediction_tools.yaml defines the configured commands, mismatch limits, PAM settings, and runtime options summarized for the public data release. The manifest records the standardized contract files consumed by the public benchmark runner. The Zenodo data notes list the larger deposited files that are not tracked in GitHub.