This is an experimental version of our dataset analysis tool. Feedback is welcome.
The gEAR workbench for single cell RNAseq (scRNAseq) is designed to allow biologists meaningful access to single cell data, even with limited informatics training. The workbench begins by selecting a dataset for analysis, and then offers analysis tools following several standard pre-processing steps.
Start by choosing a dataset on the left.
Find marker genes
Top ranked genes per cluster
The Louvain clustering is used to find the most likely groups of associated cells within a network. Here they are color-coded. The number of neighbors will have an effect on the smallest possible size for a cluster. If you are interested in groups of cells that are all larger than 20 cells, for example (based on the gene coloring in the initial PCA) – then you can try 6, 10 or 15 neighbors, for example. However, if this is a smaller dataset of regular RNA-seq, for example, with only biological triplicates, starting with two neighbors makes more sense – because the smallest ‘natural’ group should be 3 replicates. Alternatively, if some of the populations in a single cell dataset are very small, again, 3 neighbors could be a useful approach. However, the smaller the number of neighbors, the larger the number of clusters.
The resolution determines how granular the clustering will be. It is set to 1.3 by default. To decrease resolution you can drop it to 1, for example. Or increase the number for higher resolution.
Principal Component Analysis (PCA)
Identify highly-variable genes
QC by mitochondrial content
Filtered shape: genes x obs
No mitochondrial genes with this prefix were found. This could be real, or it could be just because this prefix is case-sensitive. Common options are mt-, Mt- or MT-. (This should be handled for you automatically in a later release.)
Single cell gene expression data can be conceptualized as a large Excel spreadsheet with each column representing an individual cell and each row as a particular gene that is assayed in the dataset. In this box you can see the number of genes in the dataset (genes) and the number of cells assayed (obs). This is the overall dimensionality of the dataset. For droplet-based scRNA-Seq methods, the number of observation may be quite large before you filter out observations (cell barcodes) that likely do not represent actual cells.