Skip to contents

Parse ML and MM tags (see https://samtools.github.io/hts-specs/SAMtags.pdf, section 1.7) and return a SummarizedExperiment object with information on modified bases.

Usage

readModBam(
  bamfiles,
  regions,
  modbase,
  seqinfo = NULL,
  ncpu = 1L,
  verbose = FALSE
)

Arguments

bamfiles

Character vector with one or several paths of modBAM files, containing information about base modifications in MM and ML tags. If bamfiles is a named vector, the names are used as sample names and prefixes for read names. Otherwise, the prefixes will be s1, ..., sN, where N is the length of bamfiles. All bamfiles must have an index.

regions

A GRanges object specifying which genomic regions to extract the reads from. Alternatively, regions can be specified as a character vector (e.g. "chr1:1200-1300") that can be coerced into a GRanges object. Note that the reads are not trimmed to the boundaries of the specified ranges. As a result, returned positions will typically extend out of the specified regions.

modbase

Character vector defining the modified base for each sample. If modbase is a named vector, the names should correspond to the names of bamfiles. Otherwise, it will be assumed that the elements are in the same order as the files in bamfiles. If modbase has length 1, the same modified base will be used for all samples.

seqinfo

NULL or a Seqinfo object containing information about the set of genomic sequences (chromosomes). Alternatively, a named numeric vector with genomic sequence names and lengths. Useful to set the sorting order of sequence names.

ncpu

A numeric scalar giving the number of parallel CPU threads to to use for some of the steps in readModBam.

verbose

Logical scalar. If TRUE, report on progress.

Value

A SummarizedExperiment object with genomic positions in rows and samples in columns. The assay "mod_prob" contains per-read modification probabilities, with each column (sample) corresponding to a position-by-read SparseMatrix.

See also

https://samtools.github.io/hts-specs/SAMtags.pdf describing the SAM ML and MM tags for base modifications.

Author

Michael Stadler

Examples

modbamfile <- system.file("extdata", "6mA_1_10reads.bam",
                          package = "footprintR")
readModBam(bamfiles = modbamfile, regions = "chr1:6940000-6955000",
           modbase = "a", verbose = TRUE)
#> extracting base modifications from modBAM files
#>     opening input file /Users/runner/work/_temp/Library/footprintR/extdata/6mA_1_10reads.bam
#>     reading alignments overlapping any of 1 regions
#>     removed 1724 unaligned (e.g. soft-masked) of 25090 called bases
#>     read 3 alignments
#> finding unique genomic positions...
#> collapsed 11300 positions to 4772 unique ones
#> class: RangedSummarizedExperiment 
#> dim: 4772 1 
#> metadata(0):
#> assays(1): mod_prob
#> rownames(4772): chr1:6925830:- chr1:6925834:- ... chr1:6941622:-
#>   chr1:6941631:-
#> rowData names(0):
#> colnames(1): s1
#> colData names(4): sample modbase n_reads qscore