Given marker gene sets for cell types, identify cells with
high expression of the marker genes (positive examples), then use these
cells to create a reference transcriptome profile for each cell type and
identify additional cells of each type using SingleR
. These
marker genes should specifically expressed a single cell type, e.g.
CD3 which is expressed by all T cell subtypes would not be suitable
for specific T cell subtypes.
SingleCellExperiment
object.
Named list
of character
vectors with the
marker genes for each cell types. The marker genes must be a subset of
rownames(sce)
.
numeric
vector of length 1 or the same
length as markergenes
giving the fraction(s) of top scoring cells
for each cell type to pick to create the reference transcriptome profile.
Integer scalar or string indicating which assay of
sce
contains the expression values.
list
with additional parameters
for normGenesetExpression
.
list
with additional parameters
for aggregateReference
.
list
with additional parameters for
SingleR
.
An optional BiocParallelParam
instance determining the parallel back-end to be used during evaluation.
A list
of three elements named cells
, refs
and labels
.
cells
contains a list
with the numerical indices of the top
scoring cells for each cell type.
refs
contains the pseudo-bulk transcriptome profiles used as a
reference for label assignment, as returned by aggregateReference
.
labels
contains a DataFrame
with the
annotation statistics for each cell (one cell per row), generated by
SingleR
.
normGenesetExpression
used to calculate scores for
marker gene sets; aggregateReference
used to
create reference profiles; SingleR
used to assign
labels to cells.
if (requireNamespace("SingleR", quietly = TRUE) &&
requireNamespace("SingleCellExperiment", quietly = TRUE) &&
requireNamespace("scrapper", quietly = TRUE)) {
# create SingleCellExperiment with cell-type specific genes
library(SingleCellExperiment)
n_types <- 3
n_per_type <- 30
n_cells <- n_types * n_per_type
n_genes <- 500
fraction_specific <- 0.1
n_specific <- round(n_genes * fraction_specific)
set.seed(42)
mu <- ceiling(runif(n = n_genes, min = 0, max = 30))
u <- do.call(rbind, lapply(mu, function(x) rpois(n_cells, lambda = x)))
rownames(u) <- paste0("g", seq.int(nrow(u)))
celltype.labels <- rep(paste0("t", seq.int(n_types)), each = n_per_type)
celltype.genes <- split(sample(rownames(u), size = n_types * n_specific),
rep(paste0("t", seq.int(n_types)), each = n_specific))
for (i in seq_along(celltype.genes)) {
j <- celltype.genes[[i]]
k <- celltype.labels == paste0("t", i)
u[j, k] <- 2 * u[j, k]
}
v <- log2(u + 1)
sce <- SingleCellExperiment(assays=list(counts=u, logcounts=v))
# define marker genes (subset of true cell-type-specific genes)
marker.genes <- lapply(celltype.genes, "[", 1:5)
marker.genes
# predict cell types
res <- labelCells(sce, marker.genes,
fraction_topscoring = 0.1,
normGenesetExpressionParams = list(R = 50))
# high-scoring cells used as references for each celltype
res$cells
# ... from these, pseudo-bulks were created:
res$refs
# ... and used to predict labels for all cells
res$labels$pruned.labels
# compare predicted to true cell types
table(true = celltype.labels, predicted = res$labels$pruned.labels)
}
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: ‘Biobase’
#> The following object is masked from ‘package:MatrixGenerics’:
#>
#> rowMedians
#> The following objects are masked from ‘package:matrixStats’:
#>
#> anyMissing, rowMedians
#> predicted
#> true t1 t2 t3
#> t1 30 0 0
#> t2 0 30 0
#> t3 0 0 30