checkingChecking login status

Login required

You must be logged in to analyze datasets. Please use the login form in the top-right.

Instructions

This is an experimental version of our dataset analysis tool. Feedback is welcome.

The gEAR workbench for single cell RNAseq (scRNAseq) is designed to allow biologists meaningful access to single cell data, even with limited informatics training. The workbench begins by selecting a dataset for analysis, and then offers analysis tools following several standard pre-processing steps.

Start by choosing a dataset on the left.

Compare genes / clusters

Add description here

Find marker genes

This option will show you the marker genes within each group of cells as defined by the Louvain clustering. You can adjust the number of genes you would like to see for each cluster by adjusting the N.

Top ranked genes per cluster (click to select genes of interest)

Marker gene visualization

Select desired marker genes in the table above and/or type gene symbols (separated by commas) in the field below to visualize

  • Marker genes selected in table: 0
  • Marker genes manually entered: 0
  • Total unique genes selected: 0

Clustering (Louvain)

Group Markers New label

The Louvain clustering is used to find the most likely groups of associated cells within a network. Here they are color-coded. The number of neighbors will have an effect on the smallest possible size for a cluster. If you are interested in groups of cells that are all larger than 20 cells, for example (based on the gene coloring in the initial PCA) – then you can try 6, 10 or 15 neighbors, for example. However, if this is a smaller dataset of regular RNA-seq, for example, with only biological triplicates, starting with two neighbors makes more sense – because the smallest ‘natural’ group should be 3 replicates. Alternatively, if some of the populations in a single cell dataset are very small, again, 3 neighbors could be a useful approach. However, the smaller the number of neighbors, the larger the number of clusters.

The resolution determines how granular the clustering will be. It is set to 1.3 by default. To decrease resolution you can drop it to 1, for example. Or increase the number for higher resolution.

tSNE / UMAP

Choose one (or both). What's the difference?

tSNE
UMAP (faster)
Couldn't find this gene in the dataset:
This non-linear dimensional reduction visualization tool is used to visually cells that are similar to each other. The number of principal components to include depends on the result in the PCA step, as listed above. You can vary the number of principal components used and view how it changes the data display. The default recommendation is to look for the point in the PCA curve in which additional components result in minimal added variation.

Principal Component Analysis (PCA)

Couldn't find this gene in the dataset:

The principal component analysis indicates the groups of genes that have the largest contribution to the variability in gene expression in the dataset. For example, PC1 with have the largest effect in dividing the dataset into subtypes of cells or samples. Each principal component is composed of groups of “related” genes. Visualizing your principal component graph and knowing which genes contribute to it (listed in the table) is important for the next step. The number of principal components included will affect your tSNE (t-distributed stochastic neighbor embedding) plot.

Identify highly-variable genes

We are next going to start to using dimensionality reduction methods to look for structure within the data, but before we do that we want to filter the large gene list down to those genes that are more likely to represent the biologically important variability between each of the cells. There are several parameters that can be adjusted here to set the sensitivity versus stringency of what genes are included and you may find that trying different parameters help you identify new features of the dataset. As a brief description, the x-axis represents the average expression of genes across the dataset. The y-axis is a measure called dispersion, which indicates the variance of that gene across the dataset. The gEAR workbench will limit your maximum number of highly variable genes to 2,000. By increasing the Min mean you increase the minimal expression value of genes that may be considered as highly variable. We suggest that you change the parameters and observe the plot to further guide your selection parameters. At the completion of this step press on ‘save these genes’.

QC by mitochondrial content

Filtered shape: genes x obs

No mitochondrial genes with this prefix were found. This could be real, or it could be just because this prefix is case-sensitive. Common options are mt-, Mt- or MT-. (This should be handled for you automatically in a later release.)

Press on the plot button to see plots of (a) the number of genes in each cell; (b) number of read counts per cell; and (c) percent mitochondrial content. The general recommendation is to maintain percent mitochondrial content below 0.05% to focus on living cells. Based on the data in these plots you may wish to change some of the criteria in your previous step. Press ‘save these genes’ before moving to the next step.

Dataset:

Initial shape:

Filtered shape:

Exclude cells with < genes
Exclude cells with > genes
Exclude genes in < cells
Exclude genes in > cells

Single cell gene expression data can be conceptualized as a large Excel spreadsheet with each column representing an individual cell and each row as a particular gene that is assayed in the dataset. In this box you can see the number of genes in the dataset (genes) and the number of cells assayed (obs). This is the overall dimensionality of the dataset. For droplet-based scRNA-Seq methods, the number of observation may be quite large before you filter out observations (cell barcodes) that likely do not represent actual cells.

Genes with highest fraction of counts per cell

Action log