Skip to contents

Parse ML and MM tags (see https://samtools.github.io/hts-specs/SAMtags.pdf, section 1.7) and return a SummarizedExperiment object with information on modified bases. Implicitly called bases will get a modification probability of zero.

Usage

readModBam(
  bamfiles,
  regions = NULL,
  modbase,
  nAlnsToSample = 0,
  seqnamesToSampleFrom = "chr19",
  seqinfo = NULL,
  sequenceContextWidth = 0,
  sequenceReference = NULL,
  variantPositions = NULL,
  BPPARAM = bpparam(),
  verbose = FALSE
)

Arguments

bamfiles

Character vector with one or several paths of modBAM files, containing information about base modifications in MM and ML tags. If bamfiles is a named vector, the names are used as sample names and prefixes for read names. Otherwise, the prefixes will be s1, ..., sN, where N is the length of bamfiles. All bamfiles must have an index.

regions

A GRanges object specifying which genomic regions to extract the reads from. Alternatively, regions can be specified as a character vector (e.g. "chr1:1200-1300") that can be coerced into a GRanges object. Note that the reads are not trimmed to the boundaries of the specified ranges. As a result, returned positions will typically extend out of the specified regions. If nAlnsToSample is set to a non-zero value, regions is ignored.

modbase

Character vector defining the modified base for each sample. If modbase is a named vector, the names should correspond to the names of bamfiles. Otherwise, it will be assumed that the elements are in the same order as the files in bamfiles. If modbase has length 1, the same modified base will be used for all samples.

nAlnsToSample

A numeric scalar. If non-zero, regions is ignored and approximately nAlnsToSample randomly selected alignments on seqnamesToSampleFrom are read from each of the bamfiles. In order to make the results reproducible, make sure to set the RNGseed argument in the provided BPPARAM object (see below).

seqnamesToSampleFrom

A character vector with one or several sequence names (chromosomes) from which to sample alignments from (only used if nAlnsToSample is greater than zero).

seqinfo

NULL or a Seqinfo object containing information about the set of genomic sequences (chromosomes). Alternatively, a named numeric vector with genomic sequence names and lengths. Useful to set the sorting order of sequence names.

sequenceContextWidth, sequenceReference

Define the sequence context to be extracted around modified bases. By default ( sequenceContextWidth = 0), no sequence context will be extracted, otherwise it will be returned in rowData(x)$sequenceContext. See addSeqContext for details.

variantPositions

An optional GPos object with seqnames and coordinates of single nucleotide variant positions, to be used to construct read labels for allele-specific analysis. Ignored if NULL or nAlnsToSample > 0 (sampling-mode).

BPPARAM

A BiocParallelParam object that controls the number of parallel CPU threads to use for some of the steps in readModBam(). The default value (bpparam) will select an appropriate value for the current environment, or the default parallel backend registered using register. If randomly sampling reads (nAlnsToSample > 0), make sure to set the RNGseed argument when constructing the BPPARAM object for reproducible results (see also vignette("Random_Numbers", package = "BiocParallel")).

verbose

Logical scalar. If TRUE, report on progress.

Value

A SummarizedExperiment object with genomic positions in rows and samples in columns. The assay "mod_prob" contains per-read modification probabilities, with each column (sample) corresponding to a position-by-read NaMatrix.

See also

https://samtools.github.io/hts-specs/SAMtags.pdf describing the SAM ML and MM tags for base modifications.

Author

Michael Stadler, Charlotte Soneson

Examples

modbamfile <- system.file("extdata", "6mA_1_10reads.bam",
                          package = "footprintR")
readModBam(bamfiles = modbamfile, regions = "chr1:6940000-6955000",
           modbase = "a", verbose = TRUE)
#>  extracting base modifications from modBAM files
#>  opening input file /Users/runner/work/_temp/Library/footprintR/extdata/6mA_1_
#>  opening input file /Users/runner/work/_temp/Library/footprintR/extdata/6mA_1_
#> 
#>  reading alignments overlapping 1 target
#>  reading alignments overlapping 1 target [36ms]
#> 
#>  removed 1724 unaligned (e.g. soft-masked) of 25090 called bases
#>  read 3 alignments
#>  finding unique genomic positions...
#>  finding unique genomic positions... [27ms]
#> 
#>  collapsed 11300 positions to 4772 unique ones
#>  collapsed 11300 positions to 4772 unique ones [673ms]
#> 
#> class: RangedSummarizedExperiment 
#> dim: 4772 1 
#> metadata(2): readLevelData variantPositions
#> assays(1): mod_prob
#> rownames(4772): chr1:6925830:- chr1:6925834:- ... chr1:6941622:-
#>   chr1:6941631:-
#> rowData names(0):
#> colnames(1): s1
#> colData names(4): sample modbase n_reads readInfo