Generate counts for sequential windows in a single region
Source:R/scanGenome.R
quantifyWindowsInRegion.Rd
Read modification data from bamfiles
for a chunk of the genome
defined by region
(by calling readModBam
and aggregate
modified and total counts for windows.
Usage
quantifyWindowsInRegion(
bamfiles,
region,
modbase,
modProbThreshold = 0.5,
sampleAnnot = NULL,
seqinfo = NULL,
sequenceContextWidth = 0,
sequenceReference = NULL,
sequenceContext = NULL,
windowMode = "fixed",
windowSize = 24,
windowStep = round(windowSize/2),
BPPARAM = MulticoreParam(4L, RNGseed = 42L),
verbose = FALSE
)
Arguments
- bamfiles
Character vector with one or several modBam file names. Note that no read-filtering will be performed on the data from these files, so possibly the files should contain only filtered alignments.
- region
A
GRanges
object specifying which genomic region to extract the reads from. Alternatively, the regions can be specified as a character scalar (e.g. "chr1:1200-1300") that can be coerced into aGRanges
object.- modbase
Character vector defining the modified base to extract.
- modProbThreshold
A numeric scalar, indicating the modification probability threshold to use to classify a base as 'modified' or 'unmodified'.
- sampleAnnot
A
data.frame
(orNULL
) providing annotations for the samples. It must contain at least one column, named"sample"
, which must contain all the values ofnames(bamfiles)
. The provided annotations will be propagated to the returnedSummarizedExperiment
object.- seqinfo
NULL
or aSeqinfo
object containing information about the set of genomic sequences (chromosomes). Alternatively, a named numeric vector with genomic sequence names and lengths. Useful to set the sorting order of sequence names.- sequenceContextWidth, sequenceReference
Define the sequence context to be extracted around modified bases. By default (
sequenceContextWidth = 0
), no sequence context will be extracted, otherwise it will be returned inrowData(x)$sequenceContext
. SeeaddSeqContext
for details.- sequenceContext
A character vector with sequence contexts to retain. To apply this filter, the arguments
"sequenceContextWidth"
and"sequenceReference"
must be set.- windowMode
Character scalar defining how windows in
region
are define. Currently supported are:- "fixed"
: The window is of fixed size (in number of bases), corresponding to the
windowSize
argument value.
- windowSize
Numeric scalar defining the size of windows (see
windowMode
argument).- windowStep
Numeric scalar defining the step (shift) between start positions of consecutive windows. For non-overlapping consecutive windows,
windowStep
should be equal towindowSize
.- BPPARAM
A
BiocParallelParam
object that controls the number of parallel CPU threads to use for some of the steps inreadModBam()
. The default value is (MulticoreParam(4L, RNGseed = 42L)
). If randomly sampling reads (nAlnsToSample > 0
), make sure to set theRNGseed
argument when constructing theBPPARAM
object for reproducible results (see alsovignette("Random_Numbers", package = "BiocParallel")
).- verbose
Logical scalar. If
TRUE
, report on progress.
Value
A SummarizedExperiment
object
with columns corresponding to samples (the elements of bamfiles
)
and rows corresponding to windows in region
. The object contains
the assays "Nmod"
and "Nvalid"
containing the number of
modified and total (valid) bases in each window and sample, respectively.
Examples
modbamfiles <- system.file("extdata",
c("6mA_1_10reads.bam", "6mA_2_10reads.bam"),
package = "footprintR")
gnmfasta <- system.file("extdata", "reference.fa.gz", package = "footprintR")
quantifyWindowsInRegion(bamfiles = modbamfiles,
region = "chr1:6940000-6955000",
modbase = "a",
sequenceContextWidth = 1,
sequenceReference = gnmfasta,
sequenceContext = "A",
BPPARAM = BiocParallel::SerialParam())
#> class: RangedSummarizedExperiment
#> dim: 134 2
#> metadata(1): readLevelData
#> assays(2): Nmod Nvalid
#> rownames(134): 1 2 ... 133 134
#> rowData names(0):
#> colnames(2): s1 s2
#> colData names(2): sample modbase
quantifyWindowsInRegion(bamfiles = modbamfiles,
region = "chr1:6940000-6955000",
modbase = "a",
BPPARAM = BiocParallel::SerialParam())
#> class: RangedSummarizedExperiment
#> dim: 134 2
#> metadata(1): readLevelData
#> assays(2): Nmod Nvalid
#> rownames(134): 1 2 ... 133 134
#> rowData names(0):
#> colnames(2): s1 s2
#> colData names(2): sample modbase