Filter positions based on any combination of sequence context,
coverage, repetition of the same position, and presence of non-NA values.
Filters are applied in the order they are specified to the filters
argument. Any filter type can be repeated an arbitrary number of times.
Usage
filterPositions(
se,
filters = c("sequenceContext", "coverage", "all.na"),
sequenceContext = NULL,
assayNameCov = "Nvalid",
minCov = 1,
minNbrSamples = NULL,
regions = NULL,
seqinfo = NULL,
assayNameAmbig = "Nvalid",
assayNameNA = "mod_prob"
)Arguments
- se
A
SummarizedExperimentobject.- filters
A character vector. All values must be one of
"sequenceContext","coverage","repeated.positions","regions"and"all.na". Filters are applied in the order specified by this vector.- sequenceContext
A character vector with sequence contexts to retain. To apply this filter, the
"sequenceContext"column must be present inrowData(se)(seeaddSeqContext).- assayNameCov
A character scalar indicating the assay to use to define the coverage. If this is a read-level assay, coverage is first calculated using
flattenReadLevelAssay(..., statistics = "Nvalid").- minCov
A numeric scalar indicating the lowest acceptable coverage in order to keep a position.
- minNbrSamples
A numeric scalar, or
NULL. IfNULL(default), the row sum ofassayNameCov(i.e., the total coverage across all samples) is used for the coverage filtering. If notNULL, a position is required to have at leastminCovcoverage in at leastminNbrSamplesto be retained.- regions
A
GRangesobject specifying the genomic regions to restrict the output to. Alternatively, regions can be specified as a character vector (e.g. "chr1:1200-1300", "chr2:-6000" or "chrM") that will be coerced into aGRangesobject. If the end coordinate is not provided (for example in "chr1:10-", which means "to the end of chr1"), it will be obtained fromseqinfoor default to a large value ifseqinfois not provided.- seqinfo
NULLor aSeqinfoobject containing information about the set of genomic sequences (chromosomes). Alternatively, a named numeric vector with genomic sequence names and lengths.- assayNameAmbig
A character scalar indicating the assay to use to decide which row to retain if multiple rows represent the same genomic position (on different strands). The row with the largest row sum in this assay is retained.
- assayNameNA
A character scalar indicating the assay to use as the basis for filtering out positions with NA values across all reads. This should be a read level assay.
Examples
modbamfiles <- system.file("extdata", c("6mA_1_10reads.bam", "6mA_2_10reads.bam"),
package = "footprintR")
reffile <- system.file("extdata", "reference.fa.gz", package = "footprintR")
se <- readModBam(bamfiles = modbamfiles, regions = "chr1:6920000-6940000",
modbase = "a", verbose = FALSE,
BPPARAM = BiocParallel::SerialParam())
se <- flattenReadLevelAssay(se)
se <- addSeqContext(se, sequenceContextWidth = 3, sequenceReference = reffile)
sefilt <- filterPositions(se, c("sequenceContext", "coverage", "all.na"),
minCov = 5, sequenceContext = "TAG")