Generate counts for sequential windows in a single region — quantifyWindowsInRegion • footprintR

Read modification data from bamfiles for a chunk of the genome defined by region (by calling readModBam and aggregate modified and total counts for windows.

Usage

quantifyWindowsInRegion(
  bamfiles,
  region,
  modbase,
  modProbThreshold = 0.5,
  sampleAnnot = NULL,
  seqinfo = NULL,
  sequenceContextWidth = 0,
  sequenceReference = NULL,
  sequenceContext = NULL,
  windowMode = "fixed",
  windowSize = 24,
  windowStep = round(windowSize/2),
  windows = NULL,
  quantFunction = "sumNmodNvalid",
  quantFunctionArgs = list(),
  BPPARAM = MulticoreParam(4L, RNGseed = 42L),
  verbose = FALSE
)

Arguments

bamfiles

Character vector with one or several modBam file names. Note that no read-filtering will be performed on the data from these files, so possibly the files should contain only filtered alignments.

region

A GRanges object specifying which genomic region to extract the reads from. Alternatively, the regions can be specified as a character scalar (e.g. "chr1:1200-1300") that can be coerced into a GRanges object.

modbase

Character vector defining the modified base to extract.

modProbThreshold

A numeric scalar, indicating the modification probability threshold to use to classify a base as 'modified' or 'unmodified'.

sampleAnnot

A data.frame (or NULL) providing annotations for the samples. It must contain at least one column, named "sample", which must contain all the values of names(bamfiles). The provided annotations will be propagated to the returned SummarizedExperiment object.

seqinfo

NULL or a Seqinfo object containing information about the set of genomic sequences (chromosomes). Alternatively, a named numeric vector with genomic sequence names and lengths. Useful to set the sorting order of sequence names.

sequenceContextWidth, sequenceReference

Define the sequence context to be extracted around modified bases. By default ( sequenceContextWidth = 0), no sequence context will be extracted, otherwise it will be returned in rowData(x)$sequenceContext. See addSeqContext for details.

sequenceContext

A character vector with sequence contexts to retain. To apply this filter, the arguments "sequenceContextWidth" and "sequenceReference" must be set.

windowMode

Character scalar defining how windows in region are define. Currently supported are:

"fixed": : The window is of fixed size (in number of bases), corresponding to the windowSize argument value.
"predefined": : The windows are provided by the user, as a GRanges object.

windowSize

Numeric scalar defining the size of windows (see windowMode argument).

windowStep

Numeric scalar defining the step (shift) between start positions of consecutive windows. For non-overlapping consecutive windows, windowStep should be equal to windowSize.

windows

GRanges object with pre-defined windows (if windowMode = "predefined"). Only windows where the midpoint overlaps region will be considered.

quantFunction

A character scalar giving the name of the window quantification function. This function should take as a first argument a RangedSummarizedExperiment object, as generated by readModBam. The second argument should be a GRanges object defining the windows to quantify. Additional arguments may be passed to quantFunction using quantFunctionArgs. The function must return a RangedSummarizedExperiment object with one row per window defined by the second argument and the window quantifications contained in the assay(s).

quantFunctionArgs

List with additional arguments for quantFunction.

BPPARAM

A BiocParallelParam object that controls the number of parallel CPU threads to use for some of the steps in readModBam(). The default value is (MulticoreParam(4L, RNGseed = 42L)). If randomly sampling reads (nAlnsToSample > 0), make sure to set the RNGseed argument when constructing the BPPARAM object for reproducible results (see also vignette("Random_Numbers", package = "BiocParallel")).

verbose

Logical scalar. If TRUE, report on progress.

Value

A SummarizedExperiment object with columns corresponding to samples (the elements of bamfiles) and rows corresponding to windows in region. The object contains the assays "Nmod" and "Nvalid" containing the number of modified and total (valid) bases in each window and sample, respectively.

Author

Panagiotis Papasaikas, Sebastien Smallwood, Charlotte Soneson, Michael Stadler

Examples

modbamfiles <- system.file("extdata",
                           c("6mA_1_10reads.bam", "6mA_2_10reads.bam"),
                           package = "footprintR")
gnmfasta <- system.file("extdata", "reference.fa.gz", package = "footprintR")
quantifyWindowsInRegion(bamfiles = modbamfiles,
                        region = "chr1:6940000-6955000",
                        modbase = "a",
                        sequenceContextWidth = 1,
                        sequenceReference = gnmfasta,
                        sequenceContext = "A",
                        BPPARAM = BiocParallel::SerialParam())
#> class: RangedSummarizedExperiment 
#> dim: 136 2 
#> metadata(1): readLevelData
#> assays(3): Nmod Nvalid FracMod
#> rownames(136): 1 2 ... 135 136
#> rowData names(0):
#> colnames(2): s1 s2
#> colData names(2): sample modbase
quantifyWindowsInRegion(bamfiles = modbamfiles,
                        region = "chr1:6940000-6955000",
                        modbase = "a",
                        BPPARAM = BiocParallel::SerialParam())
#> class: RangedSummarizedExperiment 
#> dim: 136 2 
#> metadata(1): readLevelData
#> assays(3): Nmod Nvalid FracMod
#> rownames(136): 1 2 ... 135 136
#> rowData names(0):
#> colnames(2): s1 s2
#> colData names(2): sample modbase