scRNA workbench

About

The workbench for single cell RNAseq (scRNAseq) is designed to allow biologists meaningful access to single cell data, even with limited informatics training. The workbench begins by selecting a dataset for analysis, and then offers analysis tools following several standard pre-processing steps.

Start by choosing a dataset on the left.

Imported analysis selected

You have a selected an analysis bundled with the dataset itself, usually created by the dataset author outside of this workbench. You can use the workbench to perform actions below provided by the analyses the authors have uploaded, but all other analysis steps will be disabled until you create an entirely new analysis.

Compare genes / clusters

N genes

Query cluster Compare with Method

P-value correction method

The method dropdown menu lists the available statistical tests for your comparison. The “t-test overestimated variance” option may help in situations where clusters contain few cells and variance is difficult to estimate directly, otherwise for robust clusters choose “t-test” (Assumes normally distributed data). The “Wilcoxon-Rank-Sum” option is a non-parametric test and may be helpful when a dataset includes a few genes with very high expression (outliers) or data distribution is not known. Multiple testing correction can be performed either by Benjamini-Hochberg or Bonferroni methods, where Bonferroni is more conservative.

Name	log2FC	P-value	FDR
No data to display

Name	log2FC	P-value	FDR
No data to display

Find marker genes

This option will show you the marker genes within each group of cells as defined by the clustering method used. You can adjust the number of genes you would like to see for each cluster by adjusting the N.

Top ranked genes per cluster (click to select genes of interest)

Marker gene visualization

Select desired marker genes in the table above and/or type gene symbols (separated by commas) in the field below to visualize

Unique marker genes selected in table: 0
Unique marker genes manually entered: 0
Total unique genes selected: 0

Save New Gene Collection

Enter name to save selected genes as a genecart.

Labeled tSNE

Enter a gene of interest to see its tSNE colored both by expression and cluster / cell type.

Unable to load this image. Perhaps that gene is not found in this dataset?

Clustering (Louvain)

Merge clusters with duplicate labels

Group	Num Cells	Markers	New label	Keep

The Louvain clustering is used to find the most likely groups of associated cells within a network. Here they are color-coded. The number of neighbors will have an effect on the smallest possible size for a cluster. If you are interested in groups of cells that are all larger than 20 cells, for example (based on the gene coloring in the initial PCA) – then you can try 6, 10 or 15 neighbors, for example. However, if this is a smaller dataset of regular RNA-seq, for example, with only biological triplicates, starting with two neighbors makes more sense – because the smallest ‘natural’ group should be 3 replicates. Alternatively, if some of the populations in a single cell dataset are very small, again, 3 neighbors could be a useful approach. However, the smaller the number of neighbors, the larger the number of clusters.

The resolution determines how granular the clustering will be. It is set to 1.3 by default. To decrease resolution you can drop it to 1, for example. Or increase the number for higher resolution.

tSNE / UMAP

Genes to colorize (comma-separated)

N neighbors

N dimensions (principal components)

Random state

Use scaled and corrected expression

Dimensionality reduction method

Choose one (or both). What's the difference?

tSNE
UMAP (faster)

Couldn't find this gene in the dataset:

This non-linear dimensional reduction visualization tool is used to visually cells that are similar to each other. The number of principal components to include depends on the result in the PCA step, as listed above. You can vary the number of principal components used and view how it changes the data display. The default recommendation is to look for the point in the PCA curve in which additional components result in minimal added variation.

Principal Component Analysis (PCA)

Couldn't find this gene in the dataset:

Cart name to create

The principal component analysis indicates the groups of genes that have the largest contribution to the variability in gene expression in the dataset. For example, PC1 with have the largest effect in dividing the dataset into subtypes of cells or samples. Each principal component is composed of groups of “related” genes. Visualizing your principal component graph and knowing which genes contribute to it (listed in the table) is important for the next step. The number of principal components included will affect your tSNE (t-distributed stochastic neighbor embedding) plot.

Identify highly-variable genes

Convention

Normalized counts per cell

N top genes

Options below are ignored if N top genes is used

Min mean

Max mean

Min dispersion

We are next going to start to use dimensionality reduction methods to look for structure within the data, but before we do that we want to filter the large gene list down to those genes that are more likely to represent the biologically important variability between each of the cells. There are several parameters that can be adjusted here to set the sensitivity versus stringency of what genes are included and you may find that trying different parameters help you identify new features of the dataset. As a brief description, the x-axis represents the average expression of genes across the dataset. The y-axis is a measure called dispersion, which indicates the variance of that gene across the dataset. The workbench will limit your maximum number of highly variable genes to 2,000. By increasing the Min mean you increase the minimal expression value of genes that may be considered as highly variable. We suggest that you change the parameters and observe the plot to further guide your selection parameters. At the completion of this step press on ‘save these genes’.

QC by mitochondrial content

Filtered shape: genes x obs

No mitochondrial genes with this prefix were found. This could be real, or it could be just because this prefix is case-sensitive. Common options are mt-, Mt- or MT-. (This should be handled for you automatically in a later release.)

Press on the plot button to see plots of (a) the number of genes in each cell; (b) number of read counts per cell; and (c) percent mitochondrial content. The general recommendation is to maintain percent mitochondrial content below 0.05% to focus on living cells. Based on the data in these plots you may wish to change some of the criteria in your previous step. Press ‘save these genes’ before moving to the next step.

Dataset:

Initial shape:

Filtered shape:

Apply filters as desired.

Initial composition

Loading initial gene/cell count plots

Checking login status

Login required

Analysis steps available

About

Imported analysis selected

Compare genes / clusters

Find marker genes

Top ranked genes per cluster (click to select genes of interest)

Marker gene visualization

Save New Gene Collection

Labeled tSNE

Clustering (Louvain)

tSNE / UMAP

Principal Component Analysis (PCA)

Identify highly-variable genes

QC by mitochondrial content

Dataset:

Initial composition

Genes with highest fraction of counts per cell

Action log

Checking login status

Login required

Analysis steps available

About

Imported analysis selected

Compare genes / clusters

Find marker genes

Top ranked genes per cluster (click to select genes of interest) Download

Marker gene visualization

Save New Gene Collection

Labeled tSNE

Clustering (Louvain)

tSNE / UMAP

Principal Component Analysis (PCA)

Identify highly-variable genes

QC by mitochondrial content

Dataset:

Initial composition

Genes with highest fraction of counts per cell

Action log

Top ranked genes per cluster (click to select genes of interest)