# 0. Brieflow <> Brieflow Analysis ## Overview Brieflow and brieflow-analysis are closely related repositories built together and, usually, both used for a screen analysis. We distinguish these repositories like so: - brieflow: code to process OPS data on a large scale - brieflow-analysis: notebooks, files, and scripts that are used during a brieflow run Both of these work together to run modules for steps like preprocessing, SBS, phenotype, etc. Let's take a closer look: ```{image} media/brieflow_brieflow_analysis.png :align: center :alt: Brieflow <> Brieflow Analysis ``` ## Brieflow ### Components Brieflow has the following components: - lib: Brieflow library code used for performing Brieflow processing. Organized into module-specific, shared, and external code. - rules: Snakemake rule files for each module. Used to organize processses within each module with inputs, outputs, parameters, and script file location. - scripts: Python script files for processes called by rules. Organized into module-specific and shared code. - targets: Snakemake files used to define inputs and their mappings for each module - Snakefile: Main Snakefile used to call modules. One of the simplest examples for this is read calling during the SBS step. We can approach this from a top -> down perspective to understand what is going on. 1) In the main [Snakefile](https://github.com/cheeseman-lab/brieflow/blob/main/workflow/Snakefile) we tell Snakemake to include the rules and targets for the entire SBS module: ```python if "sbs" in config and len(sbs_wildcard_combos) > 0: # Include target and rule files include: "targets/sbs.smk" include: "rules/sbs.smk" ``` 2) Snakemake first looks at the [targets](https://github.com/cheeseman-lab/brieflow/blob/main/workflow/targets/sbs.smk) to see what we want produced. The read calling output file is specified here: ```python "call_reads": [ SBS_FP / "tsvs" / get_filename( {"plate": "{plate}", "well": "{well}", "tile": "{tile}"}, "reads", "tsv" ), ], ``` 3) Snakemake then looks through the [rules](https://github.com/cheeseman-lab/brieflow/blob/main/workflow/rules/sbs.smk) to see what needs to be run to produce this file. We need to run the following rule to get the `call_reads` output: ```python rule call_reads: input: SBS_OUTPUTS["extract_bases"], SBS_OUTPUTS["find_peaks"], output: SBS_OUTPUTS_MAPPED["call_reads"], params: call_reads_method=config["sbs"]["call_reads_method"] script: "../scripts/sbs/call_reads.py" ``` This process takes 2 inputs, produces 1 output, and passes one param to the script to do so. 4) Snakemake loads the [script](https://github.com/cheeseman-lab/brieflow/blob/main/workflow/scripts/sbs/call_reads.py) for this rule: ```python from lib.sbs.call_reads import call_reads # load bases data bases_data = pd.read_csv(snakemake.input[0], sep="\t") # load peaks data peaks_data = imread(snakemake.input[1]) # call reads reads_data = call_reads( bases_data=bases_data, peaks_data=peaks_data, method=snakemake.params.call_reads_method, ) # save reads data reads_data.to_csv(snakemake.output[0], index=False, sep="\t") ``` This is a very simple script that loads data, calls a function, and saves data. 5) Finally, snakemake accesses the [library code](https://github.com/cheeseman-lab/brieflow/blob/eb4f58947eaeb1f2dd2c7df8fb1a9f593148a55f/workflow/lib/sbs/call_reads.py#L26) that we use here: ```python def call_reads( bases_data, peaks_data=None, correction_only_in_cells=True, normalize_bases_first=True, method="median", ): """Call reads for in situ sequencing data. Call reads by compensating for channel cross-talk and calling the base with the highest corrected intensity for each cycle. Args: bases_data : pandas DataFrame Table of base intensity for all candidate reads, output of extract_bases. ... ``` ### Project Structure Brieflow is built on top of [Snakemake](https://snakemake.readthedocs.io/en/stable/index.html#snakemake). We follow the [Snakemake structure guidelines](https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html) with some exceptions. The Brieflow project structure is as follows: ``` workflow/ ├── lib/ - Brieflow library code used for performing Brieflow processing. Organized into module-specific, shared, and external code. ├── rules/ - Snakemake rule files for each module. Used to organize processses within each module with inputs, outputs, parameters, and script file location. ├── scripts/ - Python script files for processes called by modules. Organized into module-specific and shared code. ├── targets/ - Snakemake files used to define inputs and their mappings for each module. └── Snakefile - Main Snakefile used to call modules. ``` Brieflow runs as follows: - A user configure parameters in Jupyter notebooks to use the Brieflow library code correctly for their data. - A user runs the main Snakefile with bash scripts (locally or on an HPC). - The main Snakefile calls module-specific snakemake files with rules for each process. - Each process rule calls a script. - Scripts use the Brieflow library code to transform the input files defined in targets into the output files defined in targets. ## Brieflow Analysis The analysis repo holds the files neccessary for configuring and running brieflow. In the case of the read calling function above we: 1) Run the [2.configure_sbs_params.ipynb](https://github.com/cheeseman-lab/brieflow-analysis/blob/main/analysis/2.configure_sbs_params.ipynb) notebook. 2) Set the `CALL_READS_METHOD` parameter in this notebook. ```python # Define parameters for extracting bases CALL_READS_METHOD = "median" ``` 3) Save this parameter to the config file at the end of the notebook: ```python config["sbs"] = { ... "call_reads_method": CALL_READS_METHOD, ... } # Write the updated configuration back with markdown-style comments with open(CONFIG_FILE_PATH, "w") as config_file: # Write the introductory markdown-style comments config_file.write(CONFIG_FILE_HEADER) # Dump the updated YAML structure, keeping markdown comments for sections yaml.dump(config, config_file, default_flow_style=False, sort_keys=False) ``` 4) This parameter gets passed to the snakemake rule during a run (see above) ```python rule call_reads: ... params: call_reads_method=config["sbs"]["call_reads_method"] ... ``` ## Reproducibility and Modularity Brieflow and brieflow-analysis are built for reproducibility and modularity. This setup enables researchers to - work on a specific process within brieflow (ex, make siloed changes to `call_reads`) - develop versions of brieflow that can be used across multiple brieflow-analysis repositories (ex, a `custom_screen` branch for brieflow can be used in multiple brieflow-analysis repos) - track exact differences between the `main` branch of brieflow and a custom branch - host an entire screen analysis on GitHub for reproducibility