Parse ML and MM tags (see https://samtools.github.io/hts-specs/SAMtags.pdf,
section 1.7) and return a SummarizedExperiment
object with information on modified bases.
Arguments
- bamfiles
Character vector with one or several paths of
modBAM
files, containing information about base modifications inMM
andML
tags. Ifbamfiles
is a named vector, the names are used as sample names and prefixes for read names. Otherwise, the prefixes will bes1
, ...,sN
, whereN
is the length ofbamfiles
. Allbamfiles
must have an index.- regions
A
GRanges
object specifying which genomic regions to extract the reads from. Alternatively, regions can be specified as a character vector (e.g. "chr1:1200-1300") that can be coerced into aGRanges
object. Note that the reads are not trimmed to the boundaries of the specified ranges. As a result, returned positions will typically extend out of the specified regions.- modbase
Character vector defining the modified base for each sample. If
modbase
is a named vector, the names should correspond to the names ofbamfiles
. Otherwise, it will be assumed that the elements are in the same order as the files inbamfiles
. Ifmodbase
has length 1, the same modified base will be used for all samples.- seqinfo
NULL
or aSeqinfo
object containing information about the set of genomic sequences (chromosomes). Alternatively, a named numeric vector with genomic sequence names and lengths. Useful to set the sorting order of sequence names.- ncpu
A numeric scalar giving the number of parallel CPU threads to to use for some of the steps in
readModBam
.- verbose
Logical scalar. If
TRUE
, report on progress.
Value
A SummarizedExperiment
object
with genomic positions in rows and samples in columns. The assay
"mod_prob"
contains per-read modification probabilities,
with each column (sample) corresponding to a position-by-read
SparseMatrix
.
See also
https://samtools.github.io/hts-specs/SAMtags.pdf describing the SAM ML and MM tags for base modifications.
Examples
modbamfile <- system.file("extdata", "6mA_1_10reads.bam",
package = "footprintR")
readModBam(bamfiles = modbamfile, regions = "chr1:6940000-6955000",
modbase = "a", verbose = TRUE)
#> extracting base modifications from modBAM files
#> opening input file /Users/runner/work/_temp/Library/footprintR/extdata/6mA_1_10reads.bam
#> reading alignments overlapping any of 1 regions
#> removed 1724 unaligned (e.g. soft-masked) of 25090 called bases
#> read 3 alignments
#> finding unique genomic positions...
#> collapsed 11300 positions to 4772 unique ones
#> class: RangedSummarizedExperiment
#> dim: 4772 1
#> metadata(0):
#> assays(1): mod_prob
#> rownames(4772): chr1:6925830:- chr1:6925834:- ... chr1:6941622:-
#> chr1:6941631:-
#> rowData names(0):
#> colnames(1): s1
#> colData names(4): sample modbase n_reads qscore