Skip to content

Commit

Permalink
gene expression added and methods and data updated
Browse files Browse the repository at this point in the history
  • Loading branch information
Ssandor13 committed Sep 27, 2024
1 parent d3f35f0 commit 06d2ba8
Show file tree
Hide file tree
Showing 4 changed files with 36,302 additions and 417 deletions.
115 changes: 78 additions & 37 deletions docs/pecan/expression/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,65 +2,106 @@
title: Expression
---

![Expression](./../expression.svg)
## Overview

**Overview:** The expression landscape of 3432 RNA-Seq fresh frozen tumor samples (1389 blood tumors, 888 solid tumors, and 1155 brain tumors) in St. Jude Cloud is displayed via a t-SNE plot (**Figure 1**) generated using the [St. Jude Cloud RNA-Seq Expression Analysis workflow](https://platform.stjude.cloud/workflows/rnaseq-expression-classification).
This facet comprises three tabs, allowing users to explore the expression landscape of 3,432 RNA-Seq fresh frozen tumor samples (1,389 blood tumors, 888 solid tumors, and 1,155 brain tumors) using a t-SNE plot (**Figure 1**), gene expression violin plots organized by subtype for a gene of interest (**Figure 2**), gene expression overlayed on the t-SNE, or collectively within a data matrix.

![](./tSNE@2x.png)
[NEED NEW IMAGE]

**Figure 1: tSNE for Blood, Brain, and Solid Samples.** Metadata details for each sample can be accessed by mousing over the data points. This visualization is supported by D3.
**Figure 1: t-SNE for Blood, Brain, and Solid Samples.** Mouse over data points to access metadata details for each sample. Visualization powered by D3.

!!!note
- All samples use the hg38 reference genome.
- All metdata can be found by accessing our [manifest](https://platform.stjude.cloud/api/v1/manifest)
!!!
[NEED NEW IMAGE]

**Features:**
A user can explore across the 3 tSNE plots: Blood, Solid, or Brain tumor tabs and employ the features listed below:
**Figure 2: Gene Expression for TP53.** Gene expression violin plots for each sample, filtered by the gene of interest. Visualization powered by Plotly.

*Subtype categorization*- Subtypes are denoted by a specific color and a subset have been labeled on the plot.
> **Note**
> - All samples use the hg38 reference genome.
> - Full metadata can be accessed through our [manifest](https://platform.stjude.cloud/api/v1/manifest).
*Sample Summary*- A user can select a data point on the plot that opens a sample summary drawer annotating relavent metadata and information.
---

## Features for the t-SNE Plot

| Feature | Description |
|---------------------|---------------------------------------------------------------------------------------------------------------------------|
| **Subtype Categorization** | Subtypes are color-coded, and a subset is labeled on the plot. These can be turned off in the 3 dot menu. |
| **Sample Summary** | Clicking a data point opens a drawer with metadata and sample details. |
| **Filters** | Filters are categorized by Tumor Sample, Patient Phenotype, and Sample Preparation. |
| **Sample Search** | Search by individual or bulk (comma-separated) sample IDs. CompBio IDs must be exact. |
| **Lasso Tool** | Select a region on the plot to retrieve a list of samples for further investigation. |
| **Pan/Zoom** | Zoom in or pan to examine specific regions of the plot. This will disable subtype labels. |

[NEED NEW GIF]

> **Warning**
> Filtering by the sunburst will auto-populate the Root and Subtype filters. These can be manually edited but will not update the sunburst.
*Filters* - Filters are organized by Tumor Sample, Patient Phenotype, and Sample Preparation. Once a filter is selected, the subtype labels will be disabled. Functionality of each is further described below.
---

## Features for Gene Expression

*Sample Search* - A user can search individual sampleIDs or bulk IDs that are comma separated. The sampleIDs must be exact and cannot be fuzzy searched.
| Feature | Description |
|----------------------------|---------------------------------------------------------------------------------------------------------------------|
| **Gene Sandbox** | Violin plots for the gene of interest, filtered by root and subtypes. |
| **Plotly Functions** | Pan and zoom features on the right side of the gene sandbox do not affect filter components. |
| **Median Sort** | Sort the gene expression sandboxes by median expression across or within individual groups. |
| **Outlier Toggle** | Toggle off data points to keep outliers intact for the cohort currently being filtered. |

*Lasso* - Allows a user to select a specific region on the plot to retrieve a list of samples to enable further investigation. To view the sample summary of the lassoed samples, click the "Data" icon in the top right of the subnavbar. See GIF below.
For data normalization details, refer to our [Methods and Data](https://university.stjude.cloud/docs/pecan/methods-data/) page.

*Pan/Zoom* - Allows a user to examine regions of the plot in more detail, this disables any labels.
[NEED NEW GIF]
---

![](./lasso.gif)
## Gene Expression Overlay on t-SNE

!!!warning
Filtering by the sunburst will auto-populate the diagnosis and subtype filter. A user can edit this modal, but it will not update the sunburst.
!!!
Users can overlay gene expression on the t-SNE plot by selecting genes of interest. Count data is normalized using Median of Ratios (MoR). More details can be found on the [Methods and Data](https://university.stjude.cloud/docs/pecan/methods-data/) page.

**Filters Explained**
[NEED NEW GIF]

---

## Features for the Data Matrix

The data matrix displays all filtered data with sortable headers for easier exploration.

[NEED NEW GIF]

---

## Filters Explained

### Tumor Sample

1. Sample ID - A user can search individual St. Jude CompBio IDs or bulk search IDs that are comma separated. This field allows for a multi-select.
2. Subtype - This is a modal whereby a user can custom select which subtypes to view in the plot. Child nodes will automatically become enabled or disabled if a parent node is (de)selected. The number of samples and the subtype color is desginated in the modal for reference.
3. Subtype Biomarker - This field allows a multi-select of subtype biomarkers to be applied to the plot. *Note: the user cannot apply a general gene like "CTNNB1" to be applied across the plot. The user must select all biomarkers they are interested in seeing from the dropdown*
4. Sample Type - This field is a multi-select dropdown.
| Filter | Description |
|------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|
| **Sample ID** | Search by individual or bulk St. Jude CompBio IDs (comma-separated). Allows multi-select. |
| **Subtype Root** | Custom-select a root to prompt applicable subtypes. Heme is defaulted upon loading the facet unless the sunburst is employed. |
| **Subtype** | Custom-select subtypes to view on the plot. Parent node selection enables or disables child nodes. |
| **Subtype Biomarker** | Multi-select subtype biomarkers to apply on the plot. General genes like "CTNNB1" are not accepted; users must select biomarkers from dropdown. |
| **Sample Type** | Multi-select dropdown for sample types. |

### Patient Phenotype

1. Sex - A multi-select dropdown.
2. Age at Diagnosis - A scale whereby a user can manually type in the age parameters or use the scale (in years). A user can type in any age, even passed our "35+" parameter.
3. Race - This is a multi-select dropdown.
4. Ethnicity - This is multi-select dropdown.
| Filter | Description |
|-----------------------|-----------------------------------------------------------------------------|
| **Sex** | Multi-select dropdown for biological sex. |
| **Age at Diagnosis** | Adjustable scale or manual input for age in years. |
| **Race** | Multi-select dropdown for race. |
| **Ethnicity** | Multi-select dropdown for ethnicity. |

### Sample Preparation

1. Library Selection Protocol - This is a multi-select dropdown.
2. Preservative - This is a mutli-select dropwdown
| Filter | Description |
|-------------------------------|--------------------------------------------|
| **Library Selection Protocol** | Multi-select dropdown for library protocol types. |
| **Preservative** | Multi-select dropdown for sample preservative types. |

> **Warning**
> Some fields may have a "Not Available" option for samples where the data wasn't recorded (e.g., Race, Ethnicity, Sex).
!!!warning
There can be fields with a "Not Available" option for samples that did not have this value recorded (e.g., Race, Ethnicity, Sex).
!!!
> **Tip**
> For a subset of this data, refer to [Figure 4f of McLeod et al.](https://cancerdiscovery.aacrjournals.org/content/11/5/1082.long)
---

!!!tip
An example with a subset of this data can be found in [Figure 4f of McLeod et al](https://cancerdiscovery.aacrjournals.org/content/11/5/1082.long).
!!!
To see how the data was calculated and normalized, visit our [Methods and Data](https://university.stjude.cloud/docs/pecan/methods-data/) page.
Loading

0 comments on commit 06d2ba8

Please sign in to comment.