Kim Lab Website

Software Repository
Kim Laboratory
Computational Evolutionary Biology

latest news

06.29.2019

VisCello; for visualization of single cell data.

access info ...

06.27.2018

Sample data provenance from 1,347 RNAseq samples.

access info ...

06.07.2018

ORNASEQ: Ontology for RNA sequencing.

access info ...

contact us

Biology Department
School of Arts and Sciences
University of Pennsylvania
301a Lynch Laboratory
433 S University Avenue
Philadelphia, PA 19104 USA

off: (215) 746-5187
lab: (215) 898-8395
fax: (215) 898-8780

email: junhyong@sas.upenn.edu

additional links

VisCello

Maintained by Qin Zhu

GPL-2.0-only License

Using VisCello for Single Cell Data Visualization

We developed VisCello to distribute single cell analyses and provide interactive visualizations. VisCello hosts dimensionality reductions (e.g. UMAPs), cell annotations, and marker gene tables for the different subsets of the data described in this manuscript. Users can visualize gene expression on UMAP or PCA plots, on a lineage tree diagram, or as box/violin plots grouped by cell type or lineage. The plots are interactive, allowing users to zoom in on subsets of cells, define new cell annotation groups, and run differential expression analysis and GO/KEGG enrichment with these newly defined groups. Program state can be downloaded and shared, facilitating collaboration.

Installation

You can install VisCello with code below:

install.packages("devtools") # install devtools::install_github("qinzhu/VisCello") # load library(VisCello) # launch with example data cello()

Example Datasets Preprocessed for VisCello

Here's an example app with data from Paul et al.(2015). It shows basic features offered by VisCello..
Here's another app based on VisCello for interactive exploration of C. elegans embryogenesis data: https://cello.shinyapps.io/celegans/. Also available as R package for download at https://github.com/qinzhu/VisCello.celegans.

You can download an example dataset for VisCello from

git clone https://github.com/qinzhu/Celegans.L2.Cello.git

library(VisCello) cello("~/Downloads/Celegans.L2.Cello") # Change path if necessary

To put in your own dataset into VisCello for visualization, follow guidance below.

General Data Requirement

VisCello requires two main data object - an ExpressionSet object and a Cello object (or list of Cello objects), plus one configuration file. All 3 files must be put inside the same data folder. Each data folder represent one particular study.

ExpressionSet object - The ExpressionSet object is a general class from Bioconductor.

An Introduction to Bioconductor's ExpressionSet Class (2006)

Cello object - The Cello object is an S4 class specifically designed for visualizing subsets of the single cell data - by storing dimension reduction results of (subsets of) cells that are present in the global ExpressionSet, and any local meta information about the cells, such as clustering results.

Lastly, a simple configuration file needs to be editted by user to let VisCello know general information about this study.

An example can be downloaded here . This vignette will describe the preprocessing step fsubset_GSE72857_log2norm.txt: Log2 normalized expression matrix, same dimension as raw matrix. You can also input any other type of normalized data as long as it matches the dimension of raw data.or inputing data into VisCello.

library(PIVOT) pivot()

Prepare ExpressionSet object

The data-raw folder contains an example dataset and associated meta information from Paul et al. (2015). The files in the folder are:
- subset_GSE72857.txt Raw gene expression matrix, with column as cells and rows as genes, rownames must be unique.

subset_GSE72857_log2norm.txt Log2 normalized expression matrix, same dimension as raw matrix. You can also input any other type of normalized data as long as it matches the dimension of raw data.

subset_GSE72857_pmeta.txt Meta data for the cells.

Load the data into R and convert them into ExpressionSet using the following code:

A few additional work needs to be done here: some columns in pmeta, such as cell state should be treated as factors rather than numeric values.

factor_cols <- c("Mouse_ID", "cluster", "State") for(c in factor_cols) { pData(eset)[[c]] <- factor(pData(eset)[[c]]) } saveRDS(eset, "your_data_folder/eset.rds")

Now the expression data and meta information required by VisCello is in place.

[Alternative] Convert common objects to ExpressionSet

Seurat object
Monocle object

fmeta <- fData(cds) fmeta[[1]] <- fmeta$gene_short_name # Make first column gene name eset <- new("ExpressionSet", assayData = assayDataNew("environment", exprs=Matrix(as.matrix(exprs(cds)), sparse = T), norm_exprs = Matrix(log2(Matrix::t(Matrix::t(FM)/sizeFactors(cds))+1), sparse = T)), # Note this is equivalent to 'log' normalization method in monocle, you can use other normalization function phenoData = new("AnnotatedDataFrame", data = pData(cds)), featureData = new("AnnotatedDataFrame", data = fmeta))

SingleCellExperiment/SummarizedExperiment object

fmeta <- rowData(sce) # if rowData empty, you need to make a new fmeta with rownames of matrix: fmeta <- data.frame(symbol = rownames(sce)); rownames(fmeta) <- fmeta[[1]] fmeta[[1]] <- fmeta$symbol # Make sure first column is gene name eset <- new("ExpressionSet", assayData = assayDataNew("environment", exprs=Matrix(sce@assays$data$counts, sparse = T), # Change 'counts' to raw count matrix norm_exprs = Matrix(sce@assays$data$norm_counts, sparse = T)), # Change 'norm_counts' to raw count matrix phenoData = new("AnnotatedDataFrame", data = colData(sce)), featureData = new("AnnotatedDataFrame", data = fmeta))

Prepare Cello Object

Cello object allows embedding of multiple dimension reduction results for different subsets of cells. This allows "zoom-in" analysis on subset of cells as well as differential expression analysis on locally defined clusters. The basic structure of Cello is as follows:
- name Name of the cell subset
- idx : The index of the cell subset in global expression set.
- proj : Named list of projections such as PCA, t-SNE and UMAP.
- pmeta : (Optional) local meta data containing the meta data that's specific to this local cell subset. For example, in global meta data, only one "Cluster" column is allowed. But what if you have different clustering results for different cell subsets? You can store this subset-dependent result inside the local pmeta slot of cello.
- notes : (Optional) information about the cell subset to display to the user

Create Cello for VisCello, note at least one dimension reduction result must be computed and put in cello@proj:
If you already computed your dimension reduction result and wants to make a cello for it, use following R code:
Create a list to store Cello objects and save to data location.
You can create multiple Cello hosting visualization information of different subsets of the cells. Say people want to zoom in onto GMP for further analysis:
Now all the data required to run cello has been preprocessed.

Prepare Configure File and Launch

Download example configure file here
- study_name Appear as title for the app.
- study_description : Appear as footer for the app.
- organism : support mouse, human and c. elegans.
- feature_name_column : important! What's the column name of fData(eset) that corresponds to gene symbol, must be specified.
- feature_id_column : What's the column name of fData(eset) that corresponds to gene id, set same as feature_name_column if you don't have gene id.

After updating this file, put it together with previously saved eset.rds and clist.rds

Now VisCello is ready to go! To launch Viscello, in R:

library(VisCello) cello(data_path = "your_data_folder")

Host VisCello on a Server

To host VisCello on a server, you need to use either a ShinyServer (https://www.rstudio.com/products/shiny/shiny-server/) or use the shinyapps.io service (https://www.shinyapps.io/).
- STEP 1: Install VisCello from github
install_github("qinzhu/VisCello") # STEP 1

STEP 2 [IMPORTANT]: Git clone VisCello from github, replace inst/app/data/eset.rds, inst/app/data/clist.rds, inst/app/data/config.yml with your own data.

ALSO, change first line in inst/app/global.R from viscello_DEPLOY = F to viscello_DEPLOY = T.
STEP 3: Set the repositories to bioconductor in R, and then only deploy the inst/app/ folder that contains your own data.

options(repos = BiocManager::repositories()) rsconnect::deployApp("inst/app/", account = "cello", appName = "base") # change account to your own account, change app name to your own app name.

Please cite VisCello properly if you use VisCello to host your dataset.

Cite VisCello

Q. Zhu, J. I. Murray, K. Tan, J. Kim, qinzhu/VisCello: VisCello v1.0.0 (2019; https://zenodo.org/record/3262313)
For specific analysis, please check the citation listed in the module.

Reference

Paul, Franziska, Ya'ara Arkin, Amir Giladi, Diego Adhemar Jaitin, Ephraim Kenigsberg, Hadas Keren-Shaul, Deborah Winter, et al. 2015. “Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors.” Cell 163 (7). Elsevier: 1633-77.
Cao, Junyue, et al. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 357.6352 (2017): 661-667.

Software RepositoryKim LaboratoryComputational Evolutionary Biology

latest news

06.29.2019

06.27.2018

06.07.2018

contact us

additional links

VisCello

Using VisCello for Single Cell Data Visualization

Installation

Example Datasets Preprocessed for VisCello

General Data Requirement

Prepare ExpressionSet object

[Alternative] Convert common objects to ExpressionSet

Prepare Cello Object

Prepare Configure File and Launch

Host VisCello on a Server

Cite VisCello

Reference

Software Repository
Kim Laboratory
Computational Evolutionary Biology