Given marker gene sets for cell types, identify cells with
high expression of the marker genes (positive examples), then use these
cells to create a reference transcriptome profile for each cell type and
identify additional cells of each type using SingleR. These
marker genes should specifically expressed a single cell type, e.g.
CD3 which is expressed by all T cell subtypes would not be suitable
for specific T cell subtypes.
SingleCellExperiment object.
Named list of character vectors with the
marker genes for each cell types. The marker genes must be a subset of
rownames(sce).
numeric vector of length 1 or the same
length as markergenes giving the fraction(s) of top scoring cells
for each cell type to pick to create the reference transcriptome profile.
Integer scalar or string indicating which assay of
sce contains the expression values.
list with additional parameters
for normGenesetExpression.
list with additional parameters
for aggregateReference.
list with additional parameters for
SingleR.
An optional BiocParallelParam
instance determining the parallel back-end to be used during evaluation.
A list of three elements named cells, refs and labels.
cells contains a list with the numerical indices of the top
scoring cells for each cell type.
refs contains the pseudo-bulk transcriptome profiles used as a
reference for label assignment, as returned by aggregateReference.
labels contains a DataFrame with the
annotation statistics for each cell (one cell per row), generated by
SingleR.
normGenesetExpression used to calculate scores for
marker gene sets; aggregateReference used to
create reference profiles; SingleR used to assign
labels to cells.
if (requireNamespace("SingleR", quietly = TRUE) &&
requireNamespace("SingleCellExperiment", quietly = TRUE) &&
requireNamespace("scrapper", quietly = TRUE)) {
# create SingleCellExperiment with cell-type specific genes
library(SingleCellExperiment)
n_types <- 3
n_per_type <- 30
n_cells <- n_types * n_per_type
n_genes <- 500
fraction_specific <- 0.1
n_specific <- round(n_genes * fraction_specific)
set.seed(42)
mu <- ceiling(runif(n = n_genes, min = 0, max = 30))
u <- do.call(rbind, lapply(mu, function(x) rpois(n_cells, lambda = x)))
rownames(u) <- paste0("g", seq.int(nrow(u)))
celltype.labels <- rep(paste0("t", seq.int(n_types)), each = n_per_type)
celltype.genes <- split(sample(rownames(u), size = n_types * n_specific),
rep(paste0("t", seq.int(n_types)), each = n_specific))
for (i in seq_along(celltype.genes)) {
j <- celltype.genes[[i]]
k <- celltype.labels == paste0("t", i)
u[j, k] <- 2 * u[j, k]
}
v <- log2(u + 1)
sce <- SingleCellExperiment(assays=list(counts=u, logcounts=v))
# define marker genes (subset of true cell-type-specific genes)
marker.genes <- lapply(celltype.genes, "[", 1:5)
marker.genes
# predict cell types
res <- labelCells(sce, marker.genes,
fraction_topscoring = 0.1,
normGenesetExpressionParams = list(R = 50))
# high-scoring cells used as references for each celltype
res$cells
# ... from these, pseudo-bulks were created:
res$refs
# ... and used to predict labels for all cells
res$labels$pruned.labels
# compare predicted to true cell types
table(true = celltype.labels, predicted = res$labels$pruned.labels)
}
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: ‘Biobase’
#> The following object is masked from ‘package:MatrixGenerics’:
#>
#> rowMedians
#> The following objects are masked from ‘package:matrixStats’:
#>
#> anyMissing, rowMedians
#> predicted
#> true t1 t2 t3
#> t1 30 0 0
#> t2 0 29 0
#> t3 0 0 30