# 4. Visualization

## Overview

Brieflow includes a Streamlit-based visualizer for exploring your screen results.
Use it to review experiment metadata, verify analysis configuration, explore quality control metrics, and investigate clustering results interactively.

The visualizer is particularly useful for:
- Sharing results with collaborators who don't need to run the pipeline
- Quickly checking QC metrics across plates and wells
- Exploring gene clusters and their biological interpretations

**Note:** The visualizer reads from your analysis outputs.
Run it after completing the modules whose results you want to explore (e.g., run the `aggregate` and `cluster` modules before using "Cluster Analysis" section of the visualization).

## Getting Started

Navigate to the analysis directory and activate your conda environment:

```sh
cd analysis/
conda activate brieflow_SCREEN_NAME
```

Then launch the visualizer:

```sh
sh 14.run_visualization.sh
```

This opens the app in your browser at `http://localhost:8501`.

**Prerequisites by section:**

| Visualizer Section | Required Modules |
|--------------------|------------------|
| Experimental Overview | None (reads `screen.yaml`) |
| Analysis Overview | None (reads `config/config.yml`) |
| Quality Control | `preprocess`, `sbs`, `phenotype`, `merge`, or `aggregate` |
| Cluster Analysis | `aggregate` and `cluster` |

**Remote access:** If running on a remote server, use SSH port forwarding to access the visualizer in your local browser:

```sh
ssh -L 8501:localhost:8501 user@server
```

## Environment Variables

The visualizer is configured via environment variables.
The run script sets sensible defaults automatically, but you can override them for custom setups.

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `BRIEFLOW_OUTPUT_PATH` | Yes | `brieflow_output/` | Path to pipeline output directory |
| `CONFIG_PATH` | Yes | `config/config.yml` | Path to analysis configuration file |
| `SCREEN_PATH` | Yes | `screen.yaml` | Path to screen metadata file |
| `STATIC_ASSET_URL_ROOT` | No | None | URL prefix for static assets (deployment) |
| `STATIC_ASSET_PATH` | No | None | Filesystem path for static assets (deployment) |

**Note:** The static asset variables (`STATIC_ASSET_URL_ROOT`, `STATIC_ASSET_PATH`) are only needed for deployed instances where images are served from a separate location.
For local development, leave them unset.

## Application Sections

The visualizer has four main sections accessible via the left sidebar navigation.

### Experimental Overview

Displays the experiment metadata from `config/screen.yml`.
This includes:

- Screen information (experiment ID, dates, notebook references)
- Cellular conditions (cell line, treatments, plate layout)
- SBS settings (cycles, library details)
- Phenotype settings (background channels)
- Imaging rounds and stains

This page provides a human-readable summary of what was screened and how.

### Analysis Overview

Shows the technical configuration used to run the pipeline, organized into three tabs:

- **Config**: The full `config.yml` file with all module parameters
- **Git**: Repository URL and current commit hash for reproducibility
- **Dependencies**: The conda environment specification

**Note:** The Git tab is useful for documenting exactly which version of Brieflow produced your results.

### Quality Control

Interactive exploration of QC metrics generated during pipeline execution.
Use the sidebar filters to navigate by:

- **Phase**: Which pipeline module (sbs, phenotype, merge, aggregate)
- **Subgroup**: Type of QC output (features, mapping, segmentation)
- **Plate/Well**: Specific samples
- **Metric**: Individual QC measures

The main panel displays distribution plots and data tables for the selected filters.

#### Interpreting QC Metrics

QC metrics help you distinguish technical artifacts from biological signal. Key things to look for:

- **Spatial patterns**: Systematic variations across the plate (edge effects, gradients) often indicate technical issues with imaging, staining, or sample preparation rather than biology.
- **Well-to-well consistency**: Large variance between replicates or adjacent wells may signal dispensing errors, uneven cell seeding, or imaging problems.
- **Mapping rates**: Low barcode mapping rates can indicate poor SBS image quality, cycle dropout, or issues with cell segmentation affecting barcode assignment.

**Tip:** Start with Phase and Subgroup filters to narrow down to the metrics you're interested in, then drill into specific plates and wells.

### Cluster Analysis

Interactive visualization of gene clustering results using PHATE dimensionality reduction.

**Sidebar filters:**
- **Channel Combo**: Feature subset used during aggregation
- **Cell Class**: Cell population (all, Mitotic, Interphase)
- **Leiden Resolution**: Clustering granularity

**Main panel:**
- **Gene/Cluster Search**: Dropdowns to locate specific genes or clusters
- **PHATE Plot**: Interactive scatter plot—click any point to select its cluster
- **Cluster Detail**: When a cluster is selected, shows:
  - LLM-generated biological interpretation (dominant process, gene classifications)
  - Data table with cluster members
  - Gene montages with UniProt annotations

The PHATE plot supports panning and zooming via controls in the top-right corner.

#### Interpreting PHATE Plots

PHATE preserves both local and global structure from high-dimensional phenotypic data. Genes that appear close together in the plot have similar morphological profiles—their knockdown produces similar cellular phenotypes. This proximity often reflects functional relationships: genes in the same pathway or protein complex tend to cluster together because disrupting them causes related cellular changes.

#### Choosing Leiden Resolution

The Leiden resolution parameter controls clustering granularity:

- **Lower resolution** (e.g., k=5–8): Produces fewer, larger clusters capturing broad biological themes (e.g., "cell cycle regulation", "membrane trafficking"). Good for initial exploration and identifying major phenotypic categories.
- **Higher resolution** (e.g., k=15–20): Produces more, smaller clusters that distinguish fine-grained processes (e.g., separating different stages of mitosis). Better for detailed investigation once you've identified regions of interest.

Start at lower resolution to understand the landscape, then increase resolution to investigate specific areas in more detail.

#### Using LLM Annotations

When available, LLM-generated annotations provide biological interpretation for each cluster:

- **Dominant process**: The LLM's prediction of what biological function unites the cluster members.
- **Confidence level**: How certain the LLM is about its annotation (High, Medium, Low). Lower confidence may indicate novel biology or heterogeneous clusters.
- **Gene classifications**: Each gene is marked as "Established" (known role in the annotated process), "Putative" (some evidence), or "Novel" (no prior association). Novel genes in high-confidence clusters are candidates for follow-up validation.

**Note:** LLM cluster analysis may only be available for certain cell class and resolution combinations.
If LLM results aren't showing, the app will display which filter combinations have LLM data available.

#### Exploration Workflow

A typical workflow for investigating clustering results:

1. **Start broad**: Use a lower Leiden resolution to see major phenotypic categories
2. **Locate genes of interest**: Use the Gene Search dropdown to find specific genes
3. **Explore neighborhoods**: Click genes in the PHATE plot to see what they cluster with
4. **Increase resolution**: Switch to higher resolution to see finer distinctions within interesting regions
5. **Review annotations**: Check LLM interpretations and gene classifications for biological insight
6. **Identify candidates**: Look for "Novel" genes in well-annotated clusters as potential discoveries

## Tips

- **Multiple screens**: To compare screens side-by-side, run the visualizer from each screen's `analysis/` directory on different ports (e.g., `sh 14.run_visualization.sh -- --server.port=8502` for the second instance).
- **Cluster exploration**: Use the Gene Search dropdown to quickly locate a gene of interest, then click its point in the PHATE plot to see the full cluster context.