# 3. Running Modules ## Overview Each large brieflow process is referred to as a "module". This includes preprocess, SBS, phenotype, merge, aggregate, and cluster. Each module has its own config notebook (in brieflow-analysis) and targets/rules file (in brieflow). A configuration notebook is used to configure a module's parameters, which are then used in the targets/rules. The main `Snakefile` (at `brieflow/workflow/Snakefile`) connects each of these modules, as shown below: ## Getting Started Navigate to the analysis directory and activate the conda environment: ```bash cd analysis/ conda activate brieflow_SCREEN_NAME ``` **Note:** Use the `brieflow_SCREEN_NAME` conda environment for all configuration notebooks.
View Snakefile code ```python include: "targets/preprocess.smk" include: "rules/preprocess.smk" if "sbs" in config and len(sbs_wildcard_combos) > 0: # Include target and rule files include: "targets/sbs.smk" include: "rules/sbs.smk" if "phenotype" in config and len(phenotype_wildcard_combos) > 0: # Include target and rule files include: "targets/phenotype.smk" include: "rules/phenotype.smk" if "merge" in config: MERGE_COMBO_FP = Path(config["merge"]["merge_combo_fp"]) merge_wildcard_combos = pd.read_csv(MERGE_COMBO_FP, sep="\t") # Include target and rule files include: "targets/merge.smk" include: "rules/merge.smk" if "aggregate" in config: AGGREGATE_COMBO_FP = Path(config["aggregate"]["aggregate_combo_fp"]) aggregate_wildcard_combos = pd.read_csv(AGGREGATE_COMBO_FP, sep="\t") # Include target and rule files include: "targets/aggregate.smk" include: "rules/aggregate.smk" if "cluster" in config: CLUSTER_COMBO_FP = Path(config["cluster"]["cluster_combo_fp"]) cluster_wildcard_combos = pd.read_csv(CLUSTER_COMBO_FP, sep="\t") # Include target and rule files include: "targets/cluster.smk" include: "rules/cluster.smk" ```
## Workflow A typical module execution follows these steps: 1. **Configure parameters**: Run the respective configuration notebook in `brieflow-analysis/analysis` to set parameters, which are saved to `brieflow-analysis/analysis/config/config.yml`. - Example: `0.configure_preprocess_params.ipynb` 2. **Test with dry run**: Use the local `.sh` script with the `-n` flag to perform a dry run. - Example: `sh 1.run_preprocessing.sh` (already includes `-n` flag) 3. **Execute full run**: - **Local**: Remove `-n` flag from `.sh` script for local compute - **Slurm**: Use `_slurm.sh` script in a tmux session for HPC compute - Example: `bash 1.run_preprocessing_slurm.sh` **Note:** Preprocessing, SBS, and phenotype modules include special slurm scripts that split runs by plate for optimization. ## HPC Integrations The steps for running workflows currently include local and Slurm integration. To use the Slurm integration for Brieflow configure the Slurm resources in [analysis/slurm/config.yaml](analysis/slurm/config.yaml). The `slurm_partition` and `slurm_account` in `default-resources` need to be configured while the other resource requirements have suggested values. These can be adjusted as necessary. **Note:** Other Snakemake HPC integrations can be found in the [Snakemake plugin catalog](https://snakemake.github.io/snakemake-plugin-catalog/index.html#snakemake-plugin-catalog). Only the `slurm` plugin has been tested. It is important to understand that these plugins assume that the Snakemake scheduler will operate on the head HPC node, and *only the individual jobs* are submitted to the various nodes available to the HPC. Therefore, the Snakefile should be run through bash on the head node (with `slurm` or other HPC configurations). We recommend starting a tmux session for this, especially for larger jobs. ## Module-by-Module Instructions ### Preprocess **Configure**: Run `0.configure_preprocess_params.ipynb` to set preprocessing parameters. **Note:** This step determines where ND2 data is loaded from and where results are saved (default: `analysis/brieflow_output`). Users testing only SBS or phenotype can configure only those image types—see notebook for details. **Run**: - **Local**: `sh 1.run_preprocessing.sh` (remove `-n` for actual run) - **Slurm**: Set `NUM_PLATES` in `1.run_preprocessing_slurm.sh`, then: ```bash tmux new-session -s preprocessing bash 1.run_preprocessing_slurm.sh ``` ### SBS **Configure**: Run `2.configure_sbs_params.ipynb` to set SBS module parameters. ### Phenotype **Configure**: Run `3.configure_phenotype_params.ipynb` to set phenotype module parameters. ### Run SBS/Phenotype **Run**: - **Local**: `sh 4.run_sbs_phenotype.sh` (remove `-n` for actual run) - **Slurm**: Set `NUM_PLATES` in both `4a.run_sbs_slurm.sh` and `4b.run_phenotype_slurm.sh`. These can run simultaneously or separately: ```bash tmux new-session -s sbs_phenotype bash 4a.run_sbs_slurm.sh bash 4b.run_phenotype_slurm.sh ``` **Note:** To run only SBS or phenotype independently: 1. Leave the respective sample dataframe empty in `0.configure_preprocess_params.ipynb` 2. Use `--until all_sbs` or `--until all_phenotype` tags in the scripts ### Merge **Configure**: Run `5.configure_merge_params.ipynb` to set merge parameters. **Run**: - **Local**: `sh 6.run_merge.sh` (remove `-n` for actual run) - **Slurm**: ```bash tmux new-session -s merge bash 6.run_merge_slurm.sh ``` ### Classify **Configure**: Run `7.configure_classify_params.ipynb` to train a classifier for different classes of cells (optional). ### Aggregate **Configure**: Run `8.configure_aggregate_params.ipynb` to set aggregate parameters. **Run**: - **Local**: `sh 8.run_aggregate.sh` (remove `-n` for actual run) - **Slurm**: ```bash tmux new-session -s aggregate bash 9.run_aggregate_slurm.sh ``` ### Cluster **Configure**: Run `10.configure_cluster_params.ipynb` to set cluster parameters. **Run**: - **Local**: `sh 10.run_cluster.sh` (remove `-n` for actual run) - **Slurm**: ```bash tmux new-session -s cluster bash 11.run_cluster_slurm.sh ``` ### Analysis Run `12.analyze.ipynb` for interactive exploration of final outputs and feature plots. ### MozzareLLM Run LLM-based cluster annotation using [MozzareLLM](https://github.com/cheeseman-lab/mozzarellm): ```bash sh 13.run_mozzarellm.sh ``` ### Visualization Launch the Streamlit visualizer to explore results interactively: ```bash bash 14.run_visualization.sh ``` The visualizer opens to **Cluster Analysis** by default, using the optimal resolution from the `mozzarellm` section of `config.yml`. The sidebar provides access to: - **Pipeline Stats** — Parsed pipeline statistics (if a `*_stats.txt` file exists in `brieflow_output/`) - **Quality Control** — Eval plots and tables from each pipeline step - **Screen Overview** — Screen metadata (`screen.yaml`), perturbation library, and feature descriptions - **Analysis Overview** — Config, dependencies (`pyproject.toml`), and git info #### Environment Variables The launch script (`14.run_visualization.sh`) sets required and optional environment variables: | Variable | Required | Description | |---|---|---| | `BRIEFLOW_OUTPUT_PATH` | Yes | Path to `brieflow_output/` directory | | `CONFIG_PATH` | Yes | Path to `config.yml` | | `SCREEN_PATH` | Yes | Path to `screen.yaml` | | `PERTURBATION_LIBRARY_PATH` | No | Path to raw perturbation library design file (TSV/CSV) | | `FEATURE_DOC_PATH` | No | Path to feature documentation markdown | | `STATIC_ASSET_URL_ROOT` | No | URL root for nginx-served static assets (deployment) | | `STATIC_ASSET_PATH` | No | Filesystem path corresponding to static asset URL root | #### Default Cluster Resolution The visualizer reads the `mozzarellm` section of `config.yml` to set the initial cluster analysis view: ```yaml mozzarellm: cell_class: Interphase channel_combo: DAPI_TUBULIN_GH2AX_PHALLOIDIN leiden_resolution: 12 ``` Users can override these defaults via the sidebar filters. ## Example Video This video provides a step-by-step walkthrough of running a module in brieflow, including configuring parameters, testing with a dry run, and completing a full run using slurm scripts. ## Additional Notes - Slurm log files are output to `brieflow-analysis/analysis/slurm/slurm_output/main` - For large screens, restrict loaded rules by commenting out unnecessary targets/rules in `brieflow/workflow/Snakefile` to optimize DAG generation. Example when running aggregate (only merge targets and aggregate rules/targets needed):
View restricted Snakefile example ```python # include: "targets/preprocess.smk" # include: "rules/preprocess.smk" # if "sbs" in config and len(sbs_wildcard_combos) > 0: # # Include target and rule files # include: "targets/sbs.smk" # include: "rules/sbs.smk" # if "phenotype" in config and len(phenotype_wildcard_combos) > 0: # # Include target and rule files # include: "targets/phenotype.smk" # include: "rules/phenotype.smk" if "merge" in config: MERGE_COMBO_FP = Path(config["merge"]["merge_combo_fp"]) merge_wildcard_combos = pd.read_csv(MERGE_COMBO_FP, sep="\t") # Include target and rule files include: "targets/merge.smk" # include: "rules/merge.smk" if "aggregate" in config: AGGREGATE_COMBO_FP = Path(config["aggregate"]["aggregate_combo_fp"]) aggregate_wildcard_combos = pd.read_csv(AGGREGATE_COMBO_FP, sep="\t") # Include target and rule files include: "targets/aggregate.smk" include: "rules/aggregate.smk" # if "cluster" in config: # CLUSTER_COMBO_FP = Path(config["cluster"]["cluster_combo_fp"]) # cluster_wildcard_combos = pd.read_csv(CLUSTER_COMBO_FP, sep="\t") # # Include target and rule files # include: "targets/cluster.smk" # include: "rules/cluster.smk" ```
**Always perform a dry run after modifying the Snakefile to verify correct job execution.**