Skip to contents

Given one or several modBam files and a vector with chromosome lengths, quantify the modification data in windows, calculate a score for each window and fuse neighboring high-scoring windows. For example, the score could be a directional P value measuring differential modification between two groups of modBam files, and the identified windows would then be differentially modified regions (DMRs).

Usage

scanForHighScoringRegions(
  bamfiles,
  sampleAnnot,
  chromosomeLengths,
  scoreFunction = "getDifferentiallyModifiedWindows",
  modbase,
  modProbThreshold = 0.5,
  tileSize = 1e+06,
  seqinfo = NULL,
  sequenceContextWidth = 0,
  sequenceReference = NULL,
  sequenceContext = NULL,
  windowMode = "fixed",
  windowSize = 24,
  windowStep = round(windowSize/2),
  scoreCol = "dirNegLog10PValue",
  thresh = 3,
  minperiod = 3,
  maxGap = 50,
  BPPARAM = MulticoreParam(4L, RNGseed = 42L),
  verbose = FALSE,
  ...
)

Arguments

bamfiles

Character vector with one or several modBam file names. Note that no read-filtering will be performed on the data from these files, so possibly the files should contain only filtered alignments.

sampleAnnot

A data.frame (or NULL) providing annotations for the samples. It must contain at least one column, named "sample", which must contain all the values of names(bamfiles). The provided annotations will be propagated to the returned SummarizedExperiment object.

chromosomeLengths

A named character vector with lengths of chromosomes to be analyzed, for example the return value of seqlengths(BSgenomeObject).

scoreFunction

A character scalar giving the name of the scoring function. This function should take as a first argument a SummarizedExperiment object, as generated by quantifyWindowsInRegion. Additional arguments may be passed to scoreFunction using .... The function must return a GRanges object (typically corresponding to the rowRanges of the input) with the score added to the mcols().

modbase

Character vector defining the modified base to extract.

modProbThreshold

A numeric scalar, indicating the modification probability threshold to use to classify a base as 'modified' or 'unmodified'.

tileSize

A numeric scalar giving the approximate size of a genomic region for which the data is loaded at once. Chromosomes that are larger than tileSize will be automatically split into multiple tiles. Reduce this size if you would like to reduce the memory usage of the function.

seqinfo

NULL or a Seqinfo object containing information about the set of genomic sequences (chromosomes). Alternatively, a named numeric vector with genomic sequence names and lengths. Useful to set the sorting order of sequence names.

sequenceContextWidth, sequenceReference

Define the sequence context to be extracted around modified bases. By default ( sequenceContextWidth = 0), no sequence context will be extracted, otherwise it will be returned in rowData(x)$sequenceContext. See addSeqContext for details.

sequenceContext

A character vector with sequence contexts to retain. To apply this filter, the arguments "sequenceContextWidth" and "sequenceReference" must be set.

windowMode

Character scalar defining how windows in region are define. Currently supported are:

"fixed"

: The window is of fixed size (in number of bases), corresponding to the windowSize argument value.

windowSize

Numeric scalar defining the size of windows (see windowMode argument).

windowStep

Numeric scalar defining the step (shift) between start positions of consecutive windows. For non-overlapping consecutive windows, windowStep should be equal to windowSize.

scoreCol

Character scalar giving the column name in mcols(x) to use for the analysis.

thresh

A numeric scalar giving the minimal absolute window score (after smoothing, see minperiod argument) defining a region of interest. Higher values make the region detection more stringent.

minperiod

Numeric scalar that defines the low-pass filter parameter used to smooth the scores for segmentation. minperiod gives the minimal period (in number of windows) for the critical frequency in the score signal that should be retained. The default value is suitable to retain signals that occur in three neighboring windows. The filtering is performed using filtfilt and thus requires the signal package to be installed.

maxGap

Numeric scalar giving the maximal gap between neighboring windows, in base pairs, from the end of the first to the start of the next, to be fused into a single region of interest.

BPPARAM

A BiocParallelParam object that controls the number of parallel CPU threads to use for some of the steps in readModBam(). The default value is (MulticoreParam(4L, RNGseed = 42L)). If randomly sampling reads (nAlnsToSample > 0), make sure to set the RNGseed argument when constructing the BPPARAM object for reproducible results (see also vignette("Random_Numbers", package = "BiocParallel")).

verbose

Logical scalar. If TRUE, report on progress.

...

Additional parameters sent to scoreFunction.

Value

A GRanges object with identified regions of interest.

Author

Charlotte Soneson, Michael Stadler

Examples

modbamfiles <- system.file("extdata",
                           c("6mA_1_10reads.bam", "6mA_1_10reads.bam",
                             "6mA_2_10reads.bam", "6mA_2_10reads.bam"),
                           package = "footprintR")
gr <- scanForHighScoringRegions(bamfiles = modbamfiles,
                                sampleAnnot = data.frame(sample = c("s1","s2","s3","s4"),
                                                         group = c("A","A","B","B")),
                                chromosomeLengths = c(chr1 = 6955000),
                                modbase = "a", BPPARAM = BiocParallel::SerialParam())
gr
#> GRanges object with 38 ranges and 5 metadata columns:
#>        seqnames          ranges strand | dirNegLog10PValueThresh
#>           <Rle>       <IRanges>  <Rle> |               <numeric>
#>    [1]     chr1 6928530-6928565      * |                -3.89850
#>    [2]     chr1 6928626-6928685      * |                 4.07694
#>    [3]     chr1 6929190-6929213      * |                 3.05522
#>    [4]     chr1 6929346-6929561      * |                 3.91628
#>    [5]     chr1 6929802-6929861      * |                 4.45501
#>    ...      ...             ...    ... .                     ...
#>   [34]     chr1 6938526-6938597      * |                 5.30074
#>   [35]     chr1 6938610-6938657      * |                -4.04405
#>   [36]     chr1 6938718-6938765      * |                 3.31873
#>   [37]     chr1 6938874-6938897      * |                -3.24545
#>   [38]     chr1 6939966-6940049      * |                 5.22974
#>        numWindowsThresh direction dirNegLog10PValue numWindows
#>               <integer>  <factor>         <numeric>  <integer>
#>    [1]                2      down          -3.89850          2
#>    [2]                4      up             4.07694          4
#>    [3]                1      up             3.05522          1
#>    [4]                8      up             2.80645         17
#>    [5]                4      up             4.45501          4
#>    ...              ...       ...               ...        ...
#>   [34]                5      up             5.30074          5
#>   [35]                3      down          -4.04405          3
#>   [36]                3      up             3.31873          3
#>   [37]                1      down          -3.24545          1
#>   [38]                6      up             5.22974          6
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths