Generate counts for sequential windows in a single region
Source:R/scanGenome.R
quantifyWindowsInRegion.RdRead modification data from bamfiles for a chunk of the genome
defined by region (by calling readModBam and aggregate
modified and total counts for windows.
Usage
quantifyWindowsInRegion(
bamfiles,
region,
modbase,
level = "summary",
modProbThreshold = 0.5,
sampleAnnot = NULL,
seqinfo = NULL,
sequenceContextWidth = 0,
sequenceReference = NULL,
sequenceContext = NULL,
windowMode = "fixed",
windowSize = 24,
windowStep = round(windowSize/2),
windows = NULL,
quantFunction = "sumNmodNvalid",
quantFunctionArgs = list(),
BPPARAM = MulticoreParam(4L, RNGseed = 42L),
verbose = FALSE
)Arguments
- bamfiles
Character vector with one or several modBam file names. Note that no read-filtering will be performed on the data from these files, so possibly the files should contain only filtered alignments.
- region
A
GRangesobject specifying which genomic region to extract the reads from. Alternatively, the regions can be specified as a character scalar (e.g. "chr1:1200-1300") that can be coerced into aGRangesobject.- modbase
Character vector defining the modified base to extract.
- level
A character scalar giving the level of the data to required by
quantFunction(seereadModBam).- modProbThreshold
A numeric scalar, indicating the modification probability threshold to use to classify a base as 'modified' or 'unmodified'.
- sampleAnnot
A
data.frame(orNULL) providing annotations for the samples. It must contain at least one column, named"sample", which must contain all the values ofnames(bamfiles). The provided annotations will be propagated to the returnedSummarizedExperimentobject.- seqinfo
NULLor aSeqinfoobject containing information about the set of genomic sequences (chromosomes). Alternatively, a named numeric vector with genomic sequence names and lengths. Useful to set the sorting order of sequence names.- sequenceContextWidth, sequenceReference
Define the sequence context to be extracted around modified bases. By default (
sequenceContextWidth = 0), no sequence context will be extracted, otherwise it will be returned inrowData(x)$sequenceContext. SeeaddSeqContextfor details.- sequenceContext
A character vector with sequence contexts to retain. To apply this filter, the arguments
"sequenceContextWidth"and"sequenceReference"must be set.- windowMode
Character scalar defining how windows in
regionare define. Currently supported are:- "fixed"
: The window is of fixed size (in number of bases), corresponding to the
windowSizeargument value.- "predefined"
: The windows are provided by the user, as a
GRangesobject.
- windowSize
Numeric scalar defining the size of windows (see
windowModeargument).- windowStep
Numeric scalar defining the step (shift) between start positions of consecutive windows. For non-overlapping consecutive windows,
windowStepshould be equal towindowSize.- windows
GRangesobject with pre-defined windows (ifwindowMode = "predefined"). Only windows where the midpoint overlapsregionwill be considered.- quantFunction
A character scalar giving the name of the window quantification function. This function should take as a first argument a
RangedSummarizedExperimentobject, as generated byreadModBam. The second argument should be aGRangesobject defining the windows to quantify. Additional arguments may be passed toquantFunctionusingquantFunctionArgs. The function must return aRangedSummarizedExperimentobject with one row per window defined by the second argument and the window quantifications contained in the assay(s).- quantFunctionArgs
List with additional arguments for
quantFunction.- BPPARAM
A
BiocParallelParamobject that controls the number of parallel CPU threads to use for some of the steps inreadModBam(). The default value is (MulticoreParam(4L, RNGseed = 42L)). If randomly sampling reads (nAlnsToSample > 0), make sure to set theRNGseedargument when constructing theBPPARAMobject for reproducible results (see alsovignette("Random_Numbers", package = "BiocParallel")).- verbose
Logical scalar. If
TRUE, report on progress.
Value
A SummarizedExperiment object
with columns corresponding to samples (the elements of bamfiles)
and rows corresponding to windows in region. The object contains
the assays "Nmod" and "Nvalid" containing the number of
modified and total (valid) bases in each window and sample, respectively.
Examples
modbamfiles <- system.file("extdata",
c("6mA_1_10reads.bam", "6mA_2_10reads.bam"),
package = "footprintR")
gnmfasta <- system.file("extdata", "reference.fa.gz", package = "footprintR")
quantifyWindowsInRegion(bamfiles = modbamfiles,
region = "chr1:6940000-6955000",
modbase = "a",
sequenceContextWidth = 1,
sequenceReference = gnmfasta,
sequenceContext = "A",
BPPARAM = BiocParallel::SerialParam())
#> class: RangedSummarizedExperiment
#> dim: 136 2
#> metadata(1): readLevelData
#> assays(3): Nmod Nvalid FracMod
#> rownames(136): 1 2 ... 135 136
#> rowData names(0):
#> colnames(2): s1 s2
#> colData names(2): sample modbase
quantifyWindowsInRegion(bamfiles = modbamfiles,
region = "chr1:6940000-6955000",
modbase = "a",
BPPARAM = BiocParallel::SerialParam())
#> class: RangedSummarizedExperiment
#> dim: 136 2
#> metadata(1): readLevelData
#> assays(3): Nmod Nvalid FracMod
#> rownames(136): 1 2 ... 135 136
#> rowData names(0):
#> colnames(2): s1 s2
#> colData names(2): sample modbase