3. Running Modules

Overview

Each large brieflow process is referred to as a “module”. This includes preprocess, SBS, phenotype, merge, aggregate, and cluster. Each module has its own config notebook (in brieflow-analysis) and targets/rules file (in brieflow). A configuration notebook is used to configure a module’s parameters, which are then used in the targets/rules. The main Snakefile (at brieflow/workflow/Snakefile) connects each of these modules, as shown below:

Getting Started

Navigate to the analysis directory and activate the conda environment:

cd analysis/
conda activate brieflow_SCREEN_NAME

Note: Use the brieflow_SCREEN_NAME conda environment for all configuration notebooks.

View Snakefile code
include: "targets/preprocess.smk"
include: "rules/preprocess.smk"

if "sbs" in config and len(sbs_wildcard_combos) > 0:

    # Include target and rule files
    include: "targets/sbs.smk"
    include: "rules/sbs.smk"

if "phenotype" in config and len(phenotype_wildcard_combos) > 0:

    # Include target and rule files
    include: "targets/phenotype.smk"
    include: "rules/phenotype.smk"

if "merge" in config:
    MERGE_COMBO_FP = Path(config["merge"]["merge_combo_fp"])
    merge_wildcard_combos = pd.read_csv(MERGE_COMBO_FP, sep="\t")

    # Include target and rule files
    include: "targets/merge.smk"
    include: "rules/merge.smk"

if "aggregate" in config:
    AGGREGATE_COMBO_FP = Path(config["aggregate"]["aggregate_combo_fp"])
    aggregate_wildcard_combos = pd.read_csv(AGGREGATE_COMBO_FP, sep="\t")

    # Include target and rule files
    include: "targets/aggregate.smk"
    include: "rules/aggregate.smk"

if "cluster" in config:
    CLUSTER_COMBO_FP = Path(config["cluster"]["cluster_combo_fp"])
    cluster_wildcard_combos = pd.read_csv(CLUSTER_COMBO_FP, sep="\t")

    # Include target and rule files
    include: "targets/cluster.smk"
    include: "rules/cluster.smk"

Workflow

A typical module execution follows these steps:

  1. Configure parameters: Run the respective configuration notebook in brieflow-analysis/analysis to set parameters, which are saved to brieflow-analysis/analysis/config/config.yml.

    • Example: 0.configure_preprocess_params.ipynb

  2. Test with dry run: Use the local .sh script with the -n flag to perform a dry run.

    • Example: sh 1.run_preprocessing.sh (already includes -n flag)

  3. Execute full run:

    • Local: Remove -n flag from .sh script for local compute

    • Slurm: Use _slurm.sh script in a tmux session for HPC compute

      • Example: bash 1.run_preprocessing_slurm.sh

Note: Preprocessing, SBS, and phenotype modules include special slurm scripts that split runs by plate for optimization.

HPC Integrations

The steps for running workflows currently include local and Slurm integration. To use the Slurm integration for Brieflow configure the Slurm resources in analysis/slurm/config.yaml. The slurm_partition and slurm_account in default-resources need to be configured while the other resource requirements have suggested values. These can be adjusted as necessary.

Note: Other Snakemake HPC integrations can be found in the Snakemake plugin catalog. Only the slurm plugin has been tested. It is important to understand that these plugins assume that the Snakemake scheduler will operate on the head HPC node, and only the individual jobs are submitted to the various nodes available to the HPC. Therefore, the Snakefile should be run through bash on the head node (with slurm or other HPC configurations). We recommend starting a tmux session for this, especially for larger jobs.

Module-by-Module Instructions

Preprocess

Configure: Run 0.configure_preprocess_params.ipynb to set preprocessing parameters.

Note: This step determines where ND2 data is loaded from and where results are saved (default: analysis/brieflow_output). Users testing only SBS or phenotype can configure only those image types—see notebook for details.

Run:

  • Local: sh 1.run_preprocessing.sh (remove -n for actual run)

  • Slurm: Set NUM_PLATES in 1.run_preprocessing_slurm.sh, then:

    tmux new-session -s preprocessing
    bash 1.run_preprocessing_slurm.sh
    

SBS

Configure: Run 2.configure_sbs_params.ipynb to set SBS module parameters.

Phenotype

Configure: Run 3.configure_phenotype_params.ipynb to set phenotype module parameters.

Run SBS/Phenotype

Run:

  • Local: sh 4.run_sbs_phenotype.sh (remove -n for actual run)

  • Slurm: Set NUM_PLATES in both 4a.run_sbs_slurm.sh and 4b.run_phenotype_slurm.sh. These can run simultaneously or separately:

    tmux new-session -s sbs_phenotype
    bash 4a.run_sbs_slurm.sh
    bash 4b.run_phenotype_slurm.sh
    

Note: To run only SBS or phenotype independently:

  1. Leave the respective sample dataframe empty in 0.configure_preprocess_params.ipynb

  2. Use --until all_sbs or --until all_phenotype tags in the scripts

Merge

Configure: Run 5.configure_merge_params.ipynb to set merge parameters.

Run:

  • Local: sh 6.run_merge.sh (remove -n for actual run)

  • Slurm:

    tmux new-session -s merge
    bash 6.run_merge_slurm.sh
    

Classify

Configure: Run 7.configure_classify_params.ipynb to train a classifier for different classes of cells (optional).

Aggregate

Configure: Run 8.configure_aggregate_params.ipynb to set aggregate parameters.

Run:

  • Local: sh 8.run_aggregate.sh (remove -n for actual run)

  • Slurm:

    tmux new-session -s aggregate
    bash 9.run_aggregate_slurm.sh
    

Cluster

Configure: Run 10.configure_cluster_params.ipynb to set cluster parameters.

Run:

  • Local: sh 10.run_cluster.sh (remove -n for actual run)

  • Slurm:

    tmux new-session -s cluster
    bash 11.run_cluster_slurm.sh
    

Analysis

Run 12.analyze.ipynb for interactive exploration of final outputs and feature plots.

MozzareLLM

Run LLM-based cluster annotation using MozzareLLM:

sh 13.run_mozzarellm.sh

Visualization

Launch the Streamlit visualizer to explore results interactively:

bash 14.run_visualization.sh

The visualizer opens to Cluster Analysis by default, using the optimal resolution from the mozzarellm section of config.yml. The sidebar provides access to:

  • Pipeline Stats — Parsed pipeline statistics (if a *_stats.txt file exists in brieflow_output/)

  • Quality Control — Eval plots and tables from each pipeline step

  • Screen Overview — Screen metadata (screen.yaml), perturbation library, and feature descriptions

  • Analysis Overview — Config, dependencies (pyproject.toml), and git info

Environment Variables

The launch script (14.run_visualization.sh) sets required and optional environment variables:

Variable

Required

Description

BRIEFLOW_OUTPUT_PATH

Yes

Path to brieflow_output/ directory

CONFIG_PATH

Yes

Path to config.yml

SCREEN_PATH

Yes

Path to screen.yaml

PERTURBATION_LIBRARY_PATH

No

Path to raw perturbation library design file (TSV/CSV)

FEATURE_DOC_PATH

No

Path to feature documentation markdown

STATIC_ASSET_URL_ROOT

No

URL root for nginx-served static assets (deployment)

STATIC_ASSET_PATH

No

Filesystem path corresponding to static asset URL root

Default Cluster Resolution

The visualizer reads the mozzarellm section of config.yml to set the initial cluster analysis view:

mozzarellm:
  cell_class: Interphase
  channel_combo: DAPI_TUBULIN_GH2AX_PHALLOIDIN
  leiden_resolution: 12

Users can override these defaults via the sidebar filters.

Example Video

This video provides a step-by-step walkthrough of running a module in brieflow, including configuring parameters, testing with a dry run, and completing a full run using slurm scripts.

Additional Notes

  • Slurm log files are output to brieflow-analysis/analysis/slurm/slurm_output/main

  • For large screens, restrict loaded rules by commenting out unnecessary targets/rules in brieflow/workflow/Snakefile to optimize DAG generation. Example when running aggregate (only merge targets and aggregate rules/targets needed):

View restricted Snakefile example
# include: "targets/preprocess.smk"
# include: "rules/preprocess.smk"

# if "sbs" in config and len(sbs_wildcard_combos) > 0:

#     # Include target and rule files
#     include: "targets/sbs.smk"
#     include: "rules/sbs.smk"

# if "phenotype" in config and len(phenotype_wildcard_combos) > 0:

#     # Include target and rule files
#     include: "targets/phenotype.smk"
#     include: "rules/phenotype.smk"

if "merge" in config:
    MERGE_COMBO_FP = Path(config["merge"]["merge_combo_fp"])
    merge_wildcard_combos = pd.read_csv(MERGE_COMBO_FP, sep="\t")

    # Include target and rule files
    include: "targets/merge.smk"
    # include: "rules/merge.smk"

if "aggregate" in config:
    AGGREGATE_COMBO_FP = Path(config["aggregate"]["aggregate_combo_fp"])
    aggregate_wildcard_combos = pd.read_csv(AGGREGATE_COMBO_FP, sep="\t")

    # Include target and rule files
    include: "targets/aggregate.smk"
    include: "rules/aggregate.smk"

# if "cluster" in config:
#     CLUSTER_COMBO_FP = Path(config["cluster"]["cluster_combo_fp"])
#     cluster_wildcard_combos = pd.read_csv(CLUSTER_COMBO_FP, sep="\t")

#     # Include target and rule files
#     include: "targets/cluster.smk"
#     include: "rules/cluster.smk"

Always perform a dry run after modifying the Snakefile to verify correct job execution.