Filter positions — filterPositions • footprintR

Filter positions based on any combination of sequence context, coverage, repetition of the same position, and presence of non-NA values. Filters are applied in the order they are specified to the filters argument. Any filter type can be repeated an arbitrary number of times.

Usage

filterPositions(
  se,
  filters = c("sequenceContext", "coverage", "all.na"),
  sequenceContext = NULL,
  assayNameCov = "Nvalid",
  minCov = 1,
  minNbrSamples = NULL,
  assayNameAmbig = "Nvalid",
  assayNameNA = "mod_prob"
)

Arguments

se: A SummarizedExperiment object.
filters: A character vector. All values must be one of "sequenceContext", "coverage", "repeated.positions" and "all.na". Filters are applied in the order specified by this vector.
sequenceContext: A character vector with sequence contexts to retain. To apply this filter, the "sequenceContext" column must be present in rowData(se) (see addSeqContext).
assayNameCov: A character scalar indicating the assay to use to define the coverage. If this is a read-level assay, coverage is first calculated using flattenReadLevelAssay(..., statistics = "Nvalid").
minCov: A numeric scalar indicating the lowest acceptable coverage in order to keep a position.
minNbrSamples: A numeric scalar, or NULL. If NULL (default), the row sum of assayNameCov (i.e., the total coverage across all samples) is used for the coverage filtering. If not NULL, a position is required to have at least minCov coverage in at least minNbrSamples to be retained.
assayNameAmbig: A character scalar indicating the assay to use to decide which row to retain if multiple rows represent the same genomic position (on different strands). The row with the largest row sum in this assay is retained.
assayNameNA: A character scalar indicating the assay to use as the basis for filtering out positions with NA values across all reads. This should be a read level assay.

Value

A filtered SummarizedExperiment.

Author

Charlotte Soneson

Examples

modbamfiles <- system.file("extdata", c("6mA_1_10reads.bam", "6mA_2_10reads.bam"),
                           package = "footprintR")
reffile <- system.file("extdata", "reference.fa.gz", package = "footprintR")

se <- readModBam(bamfiles = modbamfiles, regions = "chr1:6920000-6940000",
                 modbase = "a", verbose = FALSE,
                 BPPARAM = BiocParallel::SerialParam())
se <- flattenReadLevelAssay(se)
se <- addSeqContext(se, sequenceContextWidth = 3, sequenceReference = reffile)
sefilt <- filterPositions(se, c("sequenceContext", "coverage", "all.na"),
                          minCov = 5, sequenceContext = "TAG")