# 3. Running Modules
## Overview
Each large brieflow process is referred to as a "module".
This includes preprocess, SBS, phenotype, merge, aggregate, and cluster.
Each module has its own config notebook (in brieflow-analysis) and targets/rules file (in brieflow).
A configuration notebook is used to configure a module's parameters, which are then used in the targets/rules.
The main `Snakefile` (at `brieflow/workflow/Snakefile`) connects each of these modules, as shown below:
## Getting Started
Navigate to the analysis directory and activate the conda environment:
```bash
cd analysis/
conda activate brieflow_SCREEN_NAME
```
**Note:** Use the `brieflow_SCREEN_NAME` conda environment for all configuration notebooks.
View Snakefile code
```python
include: "targets/preprocess.smk"
include: "rules/preprocess.smk"
if "sbs" in config and len(sbs_wildcard_combos) > 0:
# Include target and rule files
include: "targets/sbs.smk"
include: "rules/sbs.smk"
if "phenotype" in config and len(phenotype_wildcard_combos) > 0:
# Include target and rule files
include: "targets/phenotype.smk"
include: "rules/phenotype.smk"
if "merge" in config:
MERGE_COMBO_FP = Path(config["merge"]["merge_combo_fp"])
merge_wildcard_combos = pd.read_csv(MERGE_COMBO_FP, sep="\t")
# Include target and rule files
include: "targets/merge.smk"
include: "rules/merge.smk"
if "aggregate" in config:
AGGREGATE_COMBO_FP = Path(config["aggregate"]["aggregate_combo_fp"])
aggregate_wildcard_combos = pd.read_csv(AGGREGATE_COMBO_FP, sep="\t")
# Include target and rule files
include: "targets/aggregate.smk"
include: "rules/aggregate.smk"
if "cluster" in config:
CLUSTER_COMBO_FP = Path(config["cluster"]["cluster_combo_fp"])
cluster_wildcard_combos = pd.read_csv(CLUSTER_COMBO_FP, sep="\t")
# Include target and rule files
include: "targets/cluster.smk"
include: "rules/cluster.smk"
```
## Workflow
A typical module execution follows these steps:
1. **Configure parameters**: Run the respective configuration notebook in `brieflow-analysis/analysis` to set parameters, which are saved to `brieflow-analysis/analysis/config/config.yml`.
- Example: `0.configure_preprocess_params.ipynb`
2. **Test with dry run**: Use the local `.sh` script with the `-n` flag to perform a dry run.
- Example: `sh 1.run_preprocessing.sh` (already includes `-n` flag)
3. **Execute full run**:
- **Local**: Remove `-n` flag from `.sh` script for local compute
- **Slurm**: Use `_slurm.sh` script in a tmux session for HPC compute
- Example: `bash 1.run_preprocessing_slurm.sh`
**Note:** Preprocessing, SBS, and phenotype modules include special slurm scripts that split runs by plate for optimization.
## HPC Integrations
The steps for running workflows currently include local and Slurm integration.
To use the Slurm integration for Brieflow configure the Slurm resources in [analysis/slurm/config.yaml](analysis/slurm/config.yaml).
The `slurm_partition` and `slurm_account` in `default-resources` need to be configured while the other resource requirements have suggested values.
These can be adjusted as necessary.
**Note:** Other Snakemake HPC integrations can be found in the [Snakemake plugin catalog](https://snakemake.github.io/snakemake-plugin-catalog/index.html#snakemake-plugin-catalog).
Only the `slurm` plugin has been tested. It is important to understand that these plugins assume that the Snakemake scheduler will operate on the head HPC node, and *only the individual jobs* are submitted to the various nodes available to the HPC. Therefore, the Snakefile should be run through bash on the head node (with `slurm` or other HPC configurations). We recommend starting a tmux session for this, especially for larger jobs.
## Module-by-Module Instructions
### Preprocess
**Configure**: Run `0.configure_preprocess_params.ipynb` to set preprocessing parameters.
**Note:** This step determines where ND2 data is loaded from and where results are saved (default: `analysis/brieflow_output`). Users testing only SBS or phenotype can configure only those image types—see notebook for details.
**Run**:
- **Local**: `sh 1.run_preprocessing.sh` (remove `-n` for actual run)
- **Slurm**: Set `NUM_PLATES` in `1.run_preprocessing_slurm.sh`, then:
```bash
tmux new-session -s preprocessing
bash 1.run_preprocessing_slurm.sh
```
### SBS
**Configure**: Run `2.configure_sbs_params.ipynb` to set SBS module parameters.
### Phenotype
**Configure**: Run `3.configure_phenotype_params.ipynb` to set phenotype module parameters.
### Run SBS/Phenotype
**Run**:
- **Local**: `sh 4.run_sbs_phenotype.sh` (remove `-n` for actual run)
- **Slurm**: Set `NUM_PLATES` in both `4a.run_sbs_slurm.sh` and `4b.run_phenotype_slurm.sh`. These can run simultaneously or separately:
```bash
tmux new-session -s sbs_phenotype
bash 4a.run_sbs_slurm.sh
bash 4b.run_phenotype_slurm.sh
```
**Note:** To run only SBS or phenotype independently:
1. Leave the respective sample dataframe empty in `0.configure_preprocess_params.ipynb`
2. Use `--until all_sbs` or `--until all_phenotype` tags in the scripts
### Merge
**Configure**: Run `5.configure_merge_params.ipynb` to set merge parameters.
**Run**:
- **Local**: `sh 6.run_merge.sh` (remove `-n` for actual run)
- **Slurm**:
```bash
tmux new-session -s merge
bash 6.run_merge_slurm.sh
```
### Classify
**Configure**: Run `7.configure_classify_params.ipynb` to train a classifier for different classes of cells (optional).
### Aggregate
**Configure**: Run `8.configure_aggregate_params.ipynb` to set aggregate parameters.
**Run**:
- **Local**: `sh 8.run_aggregate.sh` (remove `-n` for actual run)
- **Slurm**:
```bash
tmux new-session -s aggregate
bash 9.run_aggregate_slurm.sh
```
### Cluster
**Configure**: Run `10.configure_cluster_params.ipynb` to set cluster parameters.
**Run**:
- **Local**: `sh 10.run_cluster.sh` (remove `-n` for actual run)
- **Slurm**:
```bash
tmux new-session -s cluster
bash 11.run_cluster_slurm.sh
```
### Analysis
Run `12.analyze.ipynb` for interactive exploration of final outputs and feature plots.
### MozzareLLM
Run LLM-based cluster annotation using [MozzareLLM](https://github.com/cheeseman-lab/mozzarellm):
```bash
sh 13.run_mozzarellm.sh
```
### Visualization
Launch the Streamlit visualizer to explore results interactively:
```bash
bash 14.run_visualization.sh
```
The visualizer opens to **Cluster Analysis** by default, using the optimal resolution from the `mozzarellm` section of `config.yml`. The sidebar provides access to:
- **Pipeline Stats** — Parsed pipeline statistics (if a `*_stats.txt` file exists in `brieflow_output/`)
- **Quality Control** — Eval plots and tables from each pipeline step
- **Screen Overview** — Screen metadata (`screen.yaml`), perturbation library, and feature descriptions
- **Analysis Overview** — Config, dependencies (`pyproject.toml`), and git info
#### Environment Variables
The launch script (`14.run_visualization.sh`) sets required and optional environment variables:
| Variable | Required | Description |
|---|---|---|
| `BRIEFLOW_OUTPUT_PATH` | Yes | Path to `brieflow_output/` directory |
| `CONFIG_PATH` | Yes | Path to `config.yml` |
| `SCREEN_PATH` | Yes | Path to `screen.yaml` |
| `PERTURBATION_LIBRARY_PATH` | No | Path to raw perturbation library design file (TSV/CSV) |
| `FEATURE_DOC_PATH` | No | Path to feature documentation markdown |
| `STATIC_ASSET_URL_ROOT` | No | URL root for nginx-served static assets (deployment) |
| `STATIC_ASSET_PATH` | No | Filesystem path corresponding to static asset URL root |
#### Default Cluster Resolution
The visualizer reads the `mozzarellm` section of `config.yml` to set the initial cluster analysis view:
```yaml
mozzarellm:
cell_class: Interphase
channel_combo: DAPI_TUBULIN_GH2AX_PHALLOIDIN
leiden_resolution: 12
```
Users can override these defaults via the sidebar filters.
## Example Video
This video provides a step-by-step walkthrough of running a module in brieflow, including configuring parameters, testing with a dry run, and completing a full run using slurm scripts.
## Additional Notes
- Slurm log files are output to `brieflow-analysis/analysis/slurm/slurm_output/main`
- For large screens, restrict loaded rules by commenting out unnecessary targets/rules in `brieflow/workflow/Snakefile` to optimize DAG generation. Example when running aggregate (only merge targets and aggregate rules/targets needed):
View restricted Snakefile example
```python
# include: "targets/preprocess.smk"
# include: "rules/preprocess.smk"
# if "sbs" in config and len(sbs_wildcard_combos) > 0:
# # Include target and rule files
# include: "targets/sbs.smk"
# include: "rules/sbs.smk"
# if "phenotype" in config and len(phenotype_wildcard_combos) > 0:
# # Include target and rule files
# include: "targets/phenotype.smk"
# include: "rules/phenotype.smk"
if "merge" in config:
MERGE_COMBO_FP = Path(config["merge"]["merge_combo_fp"])
merge_wildcard_combos = pd.read_csv(MERGE_COMBO_FP, sep="\t")
# Include target and rule files
include: "targets/merge.smk"
# include: "rules/merge.smk"
if "aggregate" in config:
AGGREGATE_COMBO_FP = Path(config["aggregate"]["aggregate_combo_fp"])
aggregate_wildcard_combos = pd.read_csv(AGGREGATE_COMBO_FP, sep="\t")
# Include target and rule files
include: "targets/aggregate.smk"
include: "rules/aggregate.smk"
# if "cluster" in config:
# CLUSTER_COMBO_FP = Path(config["cluster"]["cluster_combo_fp"])
# cluster_wildcard_combos = pd.read_csv(CLUSTER_COMBO_FP, sep="\t")
# # Include target and rule files
# include: "targets/cluster.smk"
# include: "rules/cluster.smk"
```
**Always perform a dry run after modifying the Snakefile to verify correct job execution.**