3. Running Modules
Overview
Each large brieflow process is referred to as a “module”.
This includes preprocess, SBS, phenotype, merge, aggregate, and cluster.
Each module has its own config notebook (in brieflow-analysis) and targets/rules file (in brieflow).
A configuration notebook is used to configure a module’s parameters, which are then used in the targets/rules.
The main Snakefile (at brieflow/workflow/Snakefile) connects each of these modules, as shown below:
Getting Started
Navigate to the analysis directory and activate the conda environment:
cd analysis/
conda activate brieflow_SCREEN_NAME
Note: Use the brieflow_SCREEN_NAME conda environment for all configuration notebooks.
View Snakefile code
include: "targets/preprocess.smk"
include: "rules/preprocess.smk"
if "sbs" in config and len(sbs_wildcard_combos) > 0:
# Include target and rule files
include: "targets/sbs.smk"
include: "rules/sbs.smk"
if "phenotype" in config and len(phenotype_wildcard_combos) > 0:
# Include target and rule files
include: "targets/phenotype.smk"
include: "rules/phenotype.smk"
if "merge" in config:
MERGE_COMBO_FP = Path(config["merge"]["merge_combo_fp"])
merge_wildcard_combos = pd.read_csv(MERGE_COMBO_FP, sep="\t")
# Include target and rule files
include: "targets/merge.smk"
include: "rules/merge.smk"
if "aggregate" in config:
AGGREGATE_COMBO_FP = Path(config["aggregate"]["aggregate_combo_fp"])
aggregate_wildcard_combos = pd.read_csv(AGGREGATE_COMBO_FP, sep="\t")
# Include target and rule files
include: "targets/aggregate.smk"
include: "rules/aggregate.smk"
if "cluster" in config:
CLUSTER_COMBO_FP = Path(config["cluster"]["cluster_combo_fp"])
cluster_wildcard_combos = pd.read_csv(CLUSTER_COMBO_FP, sep="\t")
# Include target and rule files
include: "targets/cluster.smk"
include: "rules/cluster.smk"
Workflow
A typical module execution follows these steps:
Configure parameters: Run the respective configuration notebook in
brieflow-analysis/analysisto set parameters, which are saved tobrieflow-analysis/analysis/config/config.yml.Example:
0.configure_preprocess_params.ipynb
Test with dry run: Use the local
.shscript with the-nflag to perform a dry run.Example:
sh 1.run_preprocessing.sh(already includes-nflag)
Execute full run:
Local: Remove
-nflag from.shscript for local computeSlurm: Use
_slurm.shscript in a tmux session for HPC computeExample:
bash 1.run_preprocessing_slurm.sh
Note: Preprocessing, SBS, and phenotype modules include special slurm scripts that split runs by plate for optimization.
HPC Integrations
The steps for running workflows currently include local and Slurm integration.
To use the Slurm integration for Brieflow configure the Slurm resources in analysis/slurm/config.yaml.
The slurm_partition and slurm_account in default-resources need to be configured while the other resource requirements have suggested values.
These can be adjusted as necessary.
Note: Other Snakemake HPC integrations can be found in the Snakemake plugin catalog.
Only the slurm plugin has been tested. It is important to understand that these plugins assume that the Snakemake scheduler will operate on the head HPC node, and only the individual jobs are submitted to the various nodes available to the HPC. Therefore, the Snakefile should be run through bash on the head node (with slurm or other HPC configurations). We recommend starting a tmux session for this, especially for larger jobs.
Module-by-Module Instructions
Preprocess
Configure: Run 0.configure_preprocess_params.ipynb to set preprocessing parameters.
Note: This step determines where ND2 data is loaded from and where results are saved (default: analysis/brieflow_output). Users testing only SBS or phenotype can configure only those image types—see notebook for details.
Run:
Local:
sh 1.run_preprocessing.sh(remove-nfor actual run)Slurm: Set
NUM_PLATESin1.run_preprocessing_slurm.sh, then:tmux new-session -s preprocessing bash 1.run_preprocessing_slurm.sh
SBS
Configure: Run 2.configure_sbs_params.ipynb to set SBS module parameters.
Phenotype
Configure: Run 3.configure_phenotype_params.ipynb to set phenotype module parameters.
Run SBS/Phenotype
Run:
Local:
sh 4.run_sbs_phenotype.sh(remove-nfor actual run)Slurm: Set
NUM_PLATESin both4a.run_sbs_slurm.shand4b.run_phenotype_slurm.sh. These can run simultaneously or separately:tmux new-session -s sbs_phenotype bash 4a.run_sbs_slurm.sh bash 4b.run_phenotype_slurm.sh
Note: To run only SBS or phenotype independently:
Leave the respective sample dataframe empty in
0.configure_preprocess_params.ipynbUse
--until all_sbsor--until all_phenotypetags in the scripts
Merge
Configure: Run 5.configure_merge_params.ipynb to set merge parameters.
Run:
Local:
sh 6.run_merge.sh(remove-nfor actual run)Slurm:
tmux new-session -s merge bash 6.run_merge_slurm.sh
Classify
Configure: Run 7.configure_classify_params.ipynb to train a classifier for different classes of cells (optional).
Aggregate
Configure: Run 8.configure_aggregate_params.ipynb to set aggregate parameters.
Run:
Local:
sh 8.run_aggregate.sh(remove-nfor actual run)Slurm:
tmux new-session -s aggregate bash 9.run_aggregate_slurm.sh
Cluster
Configure: Run 10.configure_cluster_params.ipynb to set cluster parameters.
Run:
Local:
sh 10.run_cluster.sh(remove-nfor actual run)Slurm:
tmux new-session -s cluster bash 11.run_cluster_slurm.sh
Analysis
Run 12.analyze.ipynb for interactive exploration of final outputs and feature plots.
MozzareLLM
Run LLM-based cluster annotation using MozzareLLM:
sh 13.run_mozzarellm.sh
Visualization
Launch the Streamlit visualizer to explore results interactively:
bash 14.run_visualization.sh
The visualizer opens to Cluster Analysis by default, using the optimal resolution from the mozzarellm section of config.yml. The sidebar provides access to:
Pipeline Stats — Parsed pipeline statistics (if a
*_stats.txtfile exists inbrieflow_output/)Quality Control — Eval plots and tables from each pipeline step
Screen Overview — Screen metadata (
screen.yaml), perturbation library, and feature descriptionsAnalysis Overview — Config, dependencies (
pyproject.toml), and git info
Environment Variables
The launch script (14.run_visualization.sh) sets required and optional environment variables:
Variable |
Required |
Description |
|---|---|---|
|
Yes |
Path to |
|
Yes |
Path to |
|
Yes |
Path to |
|
No |
Path to raw perturbation library design file (TSV/CSV) |
|
No |
Path to feature documentation markdown |
|
No |
URL root for nginx-served static assets (deployment) |
|
No |
Filesystem path corresponding to static asset URL root |
Default Cluster Resolution
The visualizer reads the mozzarellm section of config.yml to set the initial cluster analysis view:
mozzarellm:
cell_class: Interphase
channel_combo: DAPI_TUBULIN_GH2AX_PHALLOIDIN
leiden_resolution: 12
Users can override these defaults via the sidebar filters.
Example Video
This video provides a step-by-step walkthrough of running a module in brieflow, including configuring parameters, testing with a dry run, and completing a full run using slurm scripts.
Additional Notes
Slurm log files are output to
brieflow-analysis/analysis/slurm/slurm_output/mainFor large screens, restrict loaded rules by commenting out unnecessary targets/rules in
brieflow/workflow/Snakefileto optimize DAG generation. Example when running aggregate (only merge targets and aggregate rules/targets needed):
View restricted Snakefile example
# include: "targets/preprocess.smk"
# include: "rules/preprocess.smk"
# if "sbs" in config and len(sbs_wildcard_combos) > 0:
# # Include target and rule files
# include: "targets/sbs.smk"
# include: "rules/sbs.smk"
# if "phenotype" in config and len(phenotype_wildcard_combos) > 0:
# # Include target and rule files
# include: "targets/phenotype.smk"
# include: "rules/phenotype.smk"
if "merge" in config:
MERGE_COMBO_FP = Path(config["merge"]["merge_combo_fp"])
merge_wildcard_combos = pd.read_csv(MERGE_COMBO_FP, sep="\t")
# Include target and rule files
include: "targets/merge.smk"
# include: "rules/merge.smk"
if "aggregate" in config:
AGGREGATE_COMBO_FP = Path(config["aggregate"]["aggregate_combo_fp"])
aggregate_wildcard_combos = pd.read_csv(AGGREGATE_COMBO_FP, sep="\t")
# Include target and rule files
include: "targets/aggregate.smk"
include: "rules/aggregate.smk"
# if "cluster" in config:
# CLUSTER_COMBO_FP = Path(config["cluster"]["cluster_combo_fp"])
# cluster_wildcard_combos = pd.read_csv(CLUSTER_COMBO_FP, sep="\t")
# # Include target and rule files
# include: "targets/cluster.smk"
# include: "rules/cluster.smk"
Always perform a dry run after modifying the Snakefile to verify correct job execution.