Subset the reads from read-level assays — subsetReads • footprintR

This function takes read names or indices and subsets them from the specified assays of RangedSummarizedExperiment with read-level data. While subsetting of samples (the columns of the SummarizedExperiment object) can be done easily (e.g. se[, 1]), the reads are grouped by sample in read-level assays, and this function provides a more convenient way to subset these nested reads.

Usage

subsetReads(
  se,
  reads,
  prune = TRUE,
  invert = FALSE,
  removeAllNApos = FALSE,
  assayNameNA = "mod_prob"
)

Arguments

se: A RangedSummarizedExperiment, typically generated by readModBam or readModkitExtract.
reads: Defines reads to retain (or remove, if invert=TRUE). Either a character vector of read identifiers, or a named list in which names are samples from colnames(se) and the elements are index vectors (character, integer or logical) defining the reads for each sample.
prune: A logical scalar. If TRUE (the default), samples for which the subsetting retains none of the reads will be completely removed from the returned SummarizedExperiment (also from colData and from assays that do not store read-level data). If FALSE, such samples are retained (in the assays with read-level data as a zero-column SparseMatrix).
invert: A logical scalar. If FALSE (the default), only the reads defined by reads are retained. If invert=TRUE, all reads except the ones in reads are retained.
removeAllNApos: A logical scalar. If TRUE, remove all positions for which all values in assayNameNA are NA after the read subsetting.
assayNameNA: A character scalar corresponding to the name of a read-level assay (if removeAllNApos is TRUE).

Value

A subset RangedSummarizedExperiment object.

Author

Michael Stadler, Charlotte Soneson

Examples

library(SummarizedExperiment)
modbamfiles <- system.file("extdata",
                           c("6mA_1_10reads.bam", "6mA_2_10reads.bam"),
                           package = "footprintR")
se <- readModBam(modbamfiles, "chr1:6940000-6955000", "a",
                 BPPARAM = BiocParallel::SerialParam())
lapply(assay(se, "mod_prob"), colnames)
#> $s1
#> [1] "s1-233e48a7-f379-4dcf-9270-958231125563"
#> [2] "s1-d52a5f6a-a60a-4f85-913e-eada84bfbfb9"
#> [3] "s1-92e906ae-cddb-4347-a114-bf9137761a8d"
#> 
#> $s2
#> [1] "s2-034b625e-6230-4f8d-a713-3a32cd96c298"
#> [2] "s2-d03efe3b-a45b-430b-9cb6-7e5882e4faf8"
#> 

# subset by read identifiers
seSub <- subsetReads(se, c("s1-233e48a7-f379-4dcf-9270-958231125563",
                           "s2-034b625e-6230-4f8d-a713-3a32cd96c298"))
lapply(assay(seSub, "mod_prob"), colnames)
#> $s1
#> [1] "s1-233e48a7-f379-4dcf-9270-958231125563"
#> 
#> $s2
#> [1] "s2-034b625e-6230-4f8d-a713-3a32cd96c298"
#> 

# subset by a list of indices
seSub <- subsetReads(se, list(s1 = c(1, 3),
                              s2 = c(TRUE, FALSE)))
lapply(assay(seSub, "mod_prob"), colnames)
#> $s1
#> [1] "s1-233e48a7-f379-4dcf-9270-958231125563"
#> [2] "s1-92e906ae-cddb-4347-a114-bf9137761a8d"
#> 
#> $s2
#> [1] "s2-034b625e-6230-4f8d-a713-3a32cd96c298"
#>