3. Running Modules

Overview

Each large brieflow process is referred to as a “module”. This includes preprocess, SBS, phenotype, merge, aggregate, and cluster. Each module has its own config notebook (in brieflow-analysis) and targets/rules file (in brieflow). A configuration notebook is used to configure a module’s parameters, which are then used in the targets/rules. The main Snakefile (at brieflow/workflow/Snakefile) connects each of these modules, as shown below:

Getting Started

Navigate to the analysis directory and activate the conda environment:

cd analysis/
conda activate brieflow_SCREEN_NAME

Note: Use the brieflow_SCREEN_NAME conda environment for all configuration notebooks.

View Snakefile code

include: "targets/preprocess.smk"
include: "rules/preprocess.smk"

if "sbs" in config and len(sbs_wildcard_combos) > 0:

    # Include target and rule files
    include: "targets/sbs.smk"
    include: "rules/sbs.smk"

if "phenotype" in config and len(phenotype_wildcard_combos) > 0:

    # Include target and rule files
    include: "targets/phenotype.smk"
    include: "rules/phenotype.smk"

if "merge" in config:
    MERGE_COMBO_FP = Path(config["merge"]["merge_combo_fp"])
    merge_wildcard_combos = pd.read_csv(MERGE_COMBO_FP, sep="\t")

    # Include target and rule files
    include: "targets/merge.smk"
    include: "rules/merge.smk"

if "aggregate" in config:
    AGGREGATE_COMBO_FP = Path(config["aggregate"]["aggregate_combo_fp"])
    aggregate_wildcard_combos = pd.read_csv(AGGREGATE_COMBO_FP, sep="\t")

    # Include target and rule files
    include: "targets/aggregate.smk"
    include: "rules/aggregate.smk"

if "cluster" in config:
    CLUSTER_COMBO_FP = Path(config["cluster"]["cluster_combo_fp"])
    cluster_wildcard_combos = pd.read_csv(CLUSTER_COMBO_FP, sep="\t")

    # Include target and rule files
    include: "targets/cluster.smk"
    include: "rules/cluster.smk"

Workflow

A typical module execution follows these steps:

Configure parameters: Run the respective configuration notebook in brieflow-analysis/analysis to set parameters, which are saved to brieflow-analysis/analysis/config/config.yml.
- Example: 0.configure_preprocess_params.ipynb
Test with dry run: Use the local .sh script with the -n flag to perform a dry run.
- Example: sh 1.run_preprocessing.sh (already includes -n flag)
Execute full run:
- Local: Remove -n flag from .sh script for local compute
- Slurm: Use _slurm.sh script in a tmux session for HPC compute
  - Example: bash 1.run_preprocessing_slurm.sh

Note: Preprocessing, SBS, and phenotype modules include special slurm scripts that split runs by plate for optimization.

HPC Integrations

The steps for running workflows currently include local and Slurm integration. To use the Slurm integration for Brieflow configure the Slurm resources in analysis/slurm/config.yaml. The slurm_partition and slurm_account in default-resources need to be configured while the other resource requirements have suggested values. These can be adjusted as necessary.

Note: Other Snakemake HPC integrations can be found in the Snakemake plugin catalog. Only the slurm plugin has been tested. It is important to understand that these plugins assume that the Snakemake scheduler will operate on the head HPC node, and only the individual jobs are submitted to the various nodes available to the HPC. Therefore, the Snakefile should be run through bash on the head node (with slurm or other HPC configurations). We recommend starting a tmux session for this, especially for larger jobs.

Module-by-Module Instructions

Preprocess

Configure: Run 0.configure_preprocess_params.ipynb to set preprocessing parameters.

Note: This step determines where ND2 data is loaded from and where results are saved (default: analysis/brieflow_output). Users testing only SBS or phenotype can configure only those image types—see notebook for details.

Run:

Local: sh 1.run_preprocessing.sh (remove -n for actual run)

Slurm: Set NUM_PLATES in 1.run_preprocessing_slurm.sh, then:

tmux new-session -s preprocessing
bash 1.run_preprocessing_slurm.sh

SBS

Configure: Run 2.configure_sbs_params.ipynb to set SBS module parameters.

Phenotype

Configure: Run 3.configure_phenotype_params.ipynb to set phenotype module parameters.

Run SBS/Phenotype

Run:

Local: sh 4.run_sbs_phenotype.sh (remove -n for actual run)
Slurm: Set NUM_PLATES in both 4a.run_sbs_slurm.sh and 4b.run_phenotype_slurm.sh. These can run simultaneously or separately:
```
tmux new-session -s sbs_phenotype
bash 4a.run_sbs_slurm.sh
bash 4b.run_phenotype_slurm.sh
```

Note: To run only SBS or phenotype independently:

Leave the respective sample dataframe empty in 0.configure_preprocess_params.ipynb
Use --until all_sbs or --until all_phenotype tags in the scripts

Merge

Configure: Run 5.configure_merge_params.ipynb to set merge parameters.

Run:

Local: sh 6.run_merge.sh (remove -n for actual run)

Slurm:

tmux new-session -s merge
bash 6.run_merge_slurm.sh

Classify

Configure: Run 7.configure_classify_params.ipynb to train a classifier for different classes of cells (optional).

Aggregate

Configure: Run 8.configure_aggregate_params.ipynb to set aggregate parameters.

Run:

Local: sh 8.run_aggregate.sh (remove -n for actual run)

Slurm:

tmux new-session -s aggregate
bash 9.run_aggregate_slurm.sh

Cluster

Configure: Run 10.configure_cluster_params.ipynb to set cluster parameters.

Run:

Local: sh 10.run_cluster.sh (remove -n for actual run)

Slurm:

tmux new-session -s cluster
bash 11.run_cluster_slurm.sh

Analysis

Run 12.analyze.ipynb for interactive exploration of final outputs and feature plots.

MozzareLLM

Run LLM-based cluster annotation using MozzareLLM:

sh 13.run_mozzarellm.sh

Visualization

Launch the Streamlit visualizer to explore results interactively:

bash 14.run_visualization.sh

The visualizer opens to Cluster Analysis by default, using the optimal resolution from the mozzarellm section of config.yml. The sidebar provides access to:

Pipeline Stats — Parsed pipeline statistics (if a *_stats.txt file exists in brieflow_output/)
Quality Control — Eval plots and tables from each pipeline step
Screen Overview — Screen metadata (screen.yaml), perturbation library, and feature descriptions
Analysis Overview — Config, dependencies (pyproject.toml), and git info

Environment Variables

The launch script (14.run_visualization.sh) sets required and optional environment variables:

Variable	Required	Description
`BRIEFLOW_OUTPUT_PATH`	Yes	Path to `brieflow_output/` directory
`CONFIG_PATH`	Yes	Path to `config.yml`
`SCREEN_PATH`	Yes	Path to `screen.yaml`
`PERTURBATION_LIBRARY_PATH`	No	Path to raw perturbation library design file (TSV/CSV)
`FEATURE_DOC_PATH`	No	Path to feature documentation markdown
`STATIC_ASSET_URL_ROOT`	No	URL root for nginx-served static assets (deployment)
`STATIC_ASSET_PATH`	No	Filesystem path corresponding to static asset URL root

Default Cluster Resolution

The visualizer reads the mozzarellm section of config.yml to set the initial cluster analysis view:

mozzarellm:
  cell_class: Interphase
  channel_combo: DAPI_TUBULIN_GH2AX_PHALLOIDIN
  leiden_resolution: 12

Users can override these defaults via the sidebar filters.

Example Video

This video provides a step-by-step walkthrough of running a module in brieflow, including configuring parameters, testing with a dry run, and completing a full run using slurm scripts.

Additional Notes

Slurm log files are output to brieflow-analysis/analysis/slurm/slurm_output/main
For large screens, restrict loaded rules by commenting out unnecessary targets/rules in brieflow/workflow/Snakefile to optimize DAG generation. Example when running aggregate (only merge targets and aggregate rules/targets needed):

View restricted Snakefile example

# include: "targets/preprocess.smk"
# include: "rules/preprocess.smk"

# if "sbs" in config and len(sbs_wildcard_combos) > 0:

#     # Include target and rule files
#     include: "targets/sbs.smk"
#     include: "rules/sbs.smk"

# if "phenotype" in config and len(phenotype_wildcard_combos) > 0:

#     # Include target and rule files
#     include: "targets/phenotype.smk"
#     include: "rules/phenotype.smk"

if "merge" in config:
    MERGE_COMBO_FP = Path(config["merge"]["merge_combo_fp"])
    merge_wildcard_combos = pd.read_csv(MERGE_COMBO_FP, sep="\t")

    # Include target and rule files
    include: "targets/merge.smk"
    # include: "rules/merge.smk"

if "aggregate" in config:
    AGGREGATE_COMBO_FP = Path(config["aggregate"]["aggregate_combo_fp"])
    aggregate_wildcard_combos = pd.read_csv(AGGREGATE_COMBO_FP, sep="\t")

    # Include target and rule files
    include: "targets/aggregate.smk"
    include: "rules/aggregate.smk"

# if "cluster" in config:
#     CLUSTER_COMBO_FP = Path(config["cluster"]["cluster_combo_fp"])
#     cluster_wildcard_combos = pd.read_csv(CLUSTER_COMBO_FP, sep="\t")

#     # Include target and rule files
#     include: "targets/cluster.smk"
#     include: "rules/cluster.smk"

Always perform a dry run after modifying the Snakefile to verify correct job execution.