Generate counts for sequential windows in a single region
Source:R/scanGenome.R
quantifyWindowsInRegion.Rd
Read modification data from bamfiles
for a chunk of the genome
defined by region
(by calling readModBam
and aggregate
modified and total counts for windows.
Usage
quantifyWindowsInRegion(
bamfiles,
region,
modbase,
modProbThreshold = 0.5,
sampleAnnot = NULL,
seqinfo = NULL,
sequenceContextWidth = 0,
sequenceReference = NULL,
sequenceContext = NULL,
windowMode = "fixed",
windowSize = 24,
windowStep = round(windowSize/2),
windows = NULL,
quantFunction = "sumNmodNvalid",
quantFunctionArgs = list(),
BPPARAM = MulticoreParam(4L, RNGseed = 42L),
verbose = FALSE
)
Arguments
- bamfiles
Character vector with one or several modBam file names. Note that no read-filtering will be performed on the data from these files, so possibly the files should contain only filtered alignments.
- region
A
GRanges
object specifying which genomic region to extract the reads from. Alternatively, the regions can be specified as a character scalar (e.g. "chr1:1200-1300") that can be coerced into aGRanges
object.- modbase
Character vector defining the modified base to extract.
- modProbThreshold
A numeric scalar, indicating the modification probability threshold to use to classify a base as 'modified' or 'unmodified'.
- sampleAnnot
A
data.frame
(orNULL
) providing annotations for the samples. It must contain at least one column, named"sample"
, which must contain all the values ofnames(bamfiles)
. The provided annotations will be propagated to the returnedSummarizedExperiment
object.- seqinfo
NULL
or aSeqinfo
object containing information about the set of genomic sequences (chromosomes). Alternatively, a named numeric vector with genomic sequence names and lengths. Useful to set the sorting order of sequence names.- sequenceContextWidth, sequenceReference
Define the sequence context to be extracted around modified bases. By default (
sequenceContextWidth = 0
), no sequence context will be extracted, otherwise it will be returned inrowData(x)$sequenceContext
. SeeaddSeqContext
for details.- sequenceContext
A character vector with sequence contexts to retain. To apply this filter, the arguments
"sequenceContextWidth"
and"sequenceReference"
must be set.- windowMode
Character scalar defining how windows in
region
are define. Currently supported are:- "fixed"
: The window is of fixed size (in number of bases), corresponding to the
windowSize
argument value.- "predefined"
: The windows are provided by the user, as a
GRanges
object.
- windowSize
Numeric scalar defining the size of windows (see
windowMode
argument).- windowStep
Numeric scalar defining the step (shift) between start positions of consecutive windows. For non-overlapping consecutive windows,
windowStep
should be equal towindowSize
.- windows
GRanges
object with pre-defined windows (ifwindowMode = "predefined"
). Only windows where the midpoint overlapsregion
will be considered.- quantFunction
A character scalar giving the name of the window quantification function. This function should take as a first argument a
RangedSummarizedExperiment
object, as generated byreadModBam
. The second argument should be aGRanges
object defining the windows to quantify. Additional arguments may be passed toquantFunction
usingquantFunctionArgs
. The function must return aRangedSummarizedExperiment
object with one row per window defined by the second argument and the window quantifications contained in the assay(s).- quantFunctionArgs
List with additional arguments for
quantFunction
.- BPPARAM
A
BiocParallelParam
object that controls the number of parallel CPU threads to use for some of the steps inreadModBam()
. The default value is (MulticoreParam(4L, RNGseed = 42L)
). If randomly sampling reads (nAlnsToSample > 0
), make sure to set theRNGseed
argument when constructing theBPPARAM
object for reproducible results (see alsovignette("Random_Numbers", package = "BiocParallel")
).- verbose
Logical scalar. If
TRUE
, report on progress.
Value
A SummarizedExperiment
object
with columns corresponding to samples (the elements of bamfiles
)
and rows corresponding to windows in region
. The object contains
the assays "Nmod"
and "Nvalid"
containing the number of
modified and total (valid) bases in each window and sample, respectively.
Examples
modbamfiles <- system.file("extdata",
c("6mA_1_10reads.bam", "6mA_2_10reads.bam"),
package = "footprintR")
gnmfasta <- system.file("extdata", "reference.fa.gz", package = "footprintR")
quantifyWindowsInRegion(bamfiles = modbamfiles,
region = "chr1:6940000-6955000",
modbase = "a",
sequenceContextWidth = 1,
sequenceReference = gnmfasta,
sequenceContext = "A",
BPPARAM = BiocParallel::SerialParam())
#> class: RangedSummarizedExperiment
#> dim: 136 2
#> metadata(1): readLevelData
#> assays(3): Nmod Nvalid FracMod
#> rownames(136): 1 2 ... 135 136
#> rowData names(0):
#> colnames(2): s1 s2
#> colData names(2): sample modbase
quantifyWindowsInRegion(bamfiles = modbamfiles,
region = "chr1:6940000-6955000",
modbase = "a",
BPPARAM = BiocParallel::SerialParam())
#> class: RangedSummarizedExperiment
#> dim: 136 2
#> metadata(1): readLevelData
#> assays(3): Nmod Nvalid FracMod
#> rownames(136): 1 2 ... 135 136
#> rowData names(0):
#> colnames(2): s1 s2
#> colData names(2): sample modbase