# 3. Running Modules

## Overview

Each large brieflow process is referred to as a "module".
This includes preprocess, SBS, phenotype, merge, aggregate, and cluster.
Each module has its own config notebook (in brieflow-analysis) and targets/rules file (in brieflow).
A configuration notebook is used to configure a module's parameters, which are then used in the targets/rules.
The main `Snakefile` (at `brieflow/workflow/Snakefile`) connects each of these modules, as shown below:

## Getting Started

Navigate to the analysis directory and activate the conda environment:

```bash
cd analysis/
conda activate brieflow_SCREEN_NAME
```

**Note:** Use the `brieflow_SCREEN_NAME` conda environment for all configuration notebooks.


<details>
<summary>View Snakefile code</summary>

```python
include: "targets/preprocess.smk"
include: "rules/preprocess.smk"

if "sbs" in config and len(sbs_wildcard_combos) > 0:

    # Include target and rule files
    include: "targets/sbs.smk"
    include: "rules/sbs.smk"

if "phenotype" in config and len(phenotype_wildcard_combos) > 0:

    # Include target and rule files
    include: "targets/phenotype.smk"
    include: "rules/phenotype.smk"

if "merge" in config:
    MERGE_COMBO_FP = Path(config["merge"]["merge_combo_fp"])
    merge_wildcard_combos = pd.read_csv(MERGE_COMBO_FP, sep="\t")

    # Include target and rule files
    include: "targets/merge.smk"
    include: "rules/merge.smk"

if "aggregate" in config:
    AGGREGATE_COMBO_FP = Path(config["aggregate"]["aggregate_combo_fp"])
    aggregate_wildcard_combos = pd.read_csv(AGGREGATE_COMBO_FP, sep="\t")

    # Include target and rule files
    include: "targets/aggregate.smk"
    include: "rules/aggregate.smk"

if "cluster" in config:
    CLUSTER_COMBO_FP = Path(config["cluster"]["cluster_combo_fp"])
    cluster_wildcard_combos = pd.read_csv(CLUSTER_COMBO_FP, sep="\t")

    # Include target and rule files
    include: "targets/cluster.smk"
    include: "rules/cluster.smk"
```

</details>

## Workflow

A typical module execution follows these steps:

1. **Configure parameters**: Run the respective configuration notebook in `brieflow-analysis/analysis` to set parameters, which are saved to `brieflow-analysis/analysis/config/config.yml`.
   - Example: `0.configure_preprocess_params.ipynb`

2. **Test with dry run**: Use the local `.sh` script with the `-n` flag to perform a dry run.
   - Example: `sh 1.run_preprocessing.sh` (already includes `-n` flag)

3. **Execute full run**: 
   - **Local**: Remove `-n` flag from `.sh` script for local compute
   - **Slurm**: Use `_slurm.sh` script in a tmux session for HPC compute
     - Example: `bash 1.run_preprocessing_slurm.sh`

**Note:** Preprocessing, SBS, and phenotype modules include special slurm scripts that split runs by plate for optimization.

## HPC Integrations

The steps for running workflows currently include local and Slurm integration.
To use the Slurm integration for Brieflow configure the Slurm resources in [analysis/slurm/config.yaml](analysis/slurm/config.yaml).
The `slurm_partition` and `slurm_account` in `default-resources` need to be configured while the other resource requirements have suggested values.
These can be adjusted as necessary.

**Note:** Other Snakemake HPC integrations can be found in the [Snakemake plugin catalog](https://snakemake.github.io/snakemake-plugin-catalog/index.html#snakemake-plugin-catalog).
Only the `slurm` plugin has been tested. It is important to understand that these plugins assume that the Snakemake scheduler will operate on the head HPC node, and *only the individual jobs* are submitted to the various nodes available to the HPC. Therefore, the Snakefile should be run through bash on the head node (with `slurm` or other HPC configurations). We recommend starting a tmux session for this, especially for larger jobs.

## Module-by-Module Instructions

### Preprocess

**Configure**: Run `0.configure_preprocess_params.ipynb` to set preprocessing parameters.

**Note:** This step determines where ND2 data is loaded from and where results are saved (default: `analysis/brieflow_output`). Users testing only SBS or phenotype can configure only those image types—see notebook for details.

**Run**:
- **Local**: `sh 1.run_preprocessing.sh` (remove `-n` for actual run)
- **Slurm**: Set `NUM_PLATES` in `1.run_preprocessing_slurm.sh`, then:
  ```bash
  tmux new-session -s preprocessing
  bash 1.run_preprocessing_slurm.sh
  ```

### SBS

**Configure**: Run `2.configure_sbs_params.ipynb` to set SBS module parameters.

### Phenotype

**Configure**: Run `3.configure_phenotype_params.ipynb` to set phenotype module parameters.

### Run SBS/Phenotype

**Run**:
- **Local**: `sh 4.run_sbs_phenotype.sh` (remove `-n` for actual run)
- **Slurm**: Set `NUM_PLATES` in both `4a.run_sbs_slurm.sh` and `4b.run_phenotype_slurm.sh`. These can run simultaneously or separately:
  ```bash
  tmux new-session -s sbs_phenotype
  bash 4a.run_sbs_slurm.sh
  bash 4b.run_phenotype_slurm.sh
  ```

**Note:** To run only SBS or phenotype independently:
1. Leave the respective sample dataframe empty in `0.configure_preprocess_params.ipynb`
2. Use `--until all_sbs` or `--until all_phenotype` tags in the scripts

### Merge

**Configure**: Run `5.configure_merge_params.ipynb` to set merge parameters.

**Run**:
- **Local**: `sh 6.run_merge.sh` (remove `-n` for actual run)
- **Slurm**: 
  ```bash
  tmux new-session -s merge
  bash 6.run_merge_slurm.sh
  ```

### Classify

**Configure**: Run `7.configure_classify_params.ipynb` to train a classifier for different classes of cells (optional).

### Aggregate

**Configure**: Run `8.configure_aggregate_params.ipynb` to set aggregate parameters.

**Run**:
- **Local**: `sh 8.run_aggregate.sh` (remove `-n` for actual run)
- **Slurm**: 
  ```bash
  tmux new-session -s aggregate
  bash 9.run_aggregate_slurm.sh
  ```

### Cluster

**Configure**: Run `10.configure_cluster_params.ipynb` to set cluster parameters.

**Run**:
- **Local**: `sh 10.run_cluster.sh` (remove `-n` for actual run)
- **Slurm**: 
  ```bash
  tmux new-session -s cluster
  bash 11.run_cluster_slurm.sh
  ```

### Analysis

Run `12.analyze.ipynb` for interactive exploration of final outputs and feature plots.

### MozzareLLM

Run LLM-based cluster annotation using [MozzareLLM](https://github.com/cheeseman-lab/mozzarellm):

```bash
sh 13.run_mozzarellm.sh
```

### Visualization

Launch the Streamlit visualizer to explore results interactively:

```bash
bash 14.run_visualization.sh
```

The visualizer opens to **Cluster Analysis** by default, using the optimal resolution from the `mozzarellm` section of `config.yml`. The sidebar provides access to:

- **Pipeline Stats** — Parsed pipeline statistics (if a `*_stats.txt` file exists in `brieflow_output/`)
- **Quality Control** — Eval plots and tables from each pipeline step
- **Screen Overview** — Screen metadata (`screen.yaml`), perturbation library, and feature descriptions
- **Analysis Overview** — Config, dependencies (`pyproject.toml`), and git info

#### Environment Variables

The launch script (`14.run_visualization.sh`) sets required and optional environment variables:

| Variable | Required | Description |
|---|---|---|
| `BRIEFLOW_OUTPUT_PATH` | Yes | Path to `brieflow_output/` directory |
| `CONFIG_PATH` | Yes | Path to `config.yml` |
| `SCREEN_PATH` | Yes | Path to `screen.yaml` |
| `PERTURBATION_LIBRARY_PATH` | No | Path to raw perturbation library design file (TSV/CSV) |
| `FEATURE_DOC_PATH` | No | Path to feature documentation markdown |
| `STATIC_ASSET_URL_ROOT` | No | URL root for nginx-served static assets (deployment) |
| `STATIC_ASSET_PATH` | No | Filesystem path corresponding to static asset URL root |

#### Default Cluster Resolution

The visualizer reads the `mozzarellm` section of `config.yml` to set the initial cluster analysis view:

```yaml
mozzarellm:
  cell_class: Interphase
  channel_combo: DAPI_TUBULIN_GH2AX_PHALLOIDIN
  leiden_resolution: 12
```

Users can override these defaults via the sidebar filters.

## Example Video

This video provides a step-by-step walkthrough of running a module in brieflow, including configuring parameters, testing with a dry run, and completing a full run using slurm scripts.
<iframe width="560" height="315" src="https://www.youtube.com/embed/0L5yYa1S8g0" frameborder="0" allowfullscreen title="Example Video: Running Modules"></iframe>

## Additional Notes

- Slurm log files are output to `brieflow-analysis/analysis/slurm/slurm_output/main`
- For large screens, restrict loaded rules by commenting out unnecessary targets/rules in `brieflow/workflow/Snakefile` to optimize DAG generation. Example when running aggregate (only merge targets and aggregate rules/targets needed):
<details>
<summary>View restricted Snakefile example</summary>

```python
# include: "targets/preprocess.smk"
# include: "rules/preprocess.smk"

# if "sbs" in config and len(sbs_wildcard_combos) > 0:

#     # Include target and rule files
#     include: "targets/sbs.smk"
#     include: "rules/sbs.smk"

# if "phenotype" in config and len(phenotype_wildcard_combos) > 0:

#     # Include target and rule files
#     include: "targets/phenotype.smk"
#     include: "rules/phenotype.smk"

if "merge" in config:
    MERGE_COMBO_FP = Path(config["merge"]["merge_combo_fp"])
    merge_wildcard_combos = pd.read_csv(MERGE_COMBO_FP, sep="\t")

    # Include target and rule files
    include: "targets/merge.smk"
    # include: "rules/merge.smk"

if "aggregate" in config:
    AGGREGATE_COMBO_FP = Path(config["aggregate"]["aggregate_combo_fp"])
    aggregate_wildcard_combos = pd.read_csv(AGGREGATE_COMBO_FP, sep="\t")

    # Include target and rule files
    include: "targets/aggregate.smk"
    include: "rules/aggregate.smk"

# if "cluster" in config:
#     CLUSTER_COMBO_FP = Path(config["cluster"]["cluster_combo_fp"])
#     cluster_wildcard_combos = pd.read_csv(CLUSTER_COMBO_FP, sep="\t")

#     # Include target and rule files
#     include: "targets/cluster.smk"
#     include: "rules/cluster.smk"
```
</details>

**Always perform a dry run after modifying the Snakefile to verify correct job execution.**