Parse ML and MM tags (see https://samtools.github.io/hts-specs/SAMtags.pdf,
section 1.7) and return a SummarizedExperiment
object with information on modified bases. Implicitly called bases will get
a modification probability of zero.
Usage
readModBam(
bamfiles,
regions = NULL,
modbase,
nAlnsToSample = 0,
seqnamesToSampleFrom = "chr19",
seqinfo = NULL,
sequenceContextWidth = 0,
sequenceReference = NULL,
variantPositions = NULL,
BPPARAM = bpparam(),
verbose = FALSE
)
Arguments
- bamfiles
Character vector with one or several paths of
modBAM
files, containing information about base modifications inMM
andML
tags. Ifbamfiles
is a named vector, the names are used as sample names and prefixes for read names. Otherwise, the prefixes will bes1
, ...,sN
, whereN
is the length ofbamfiles
. Allbamfiles
must have an index.- regions
A
GRanges
object specifying which genomic regions to extract the reads from. Alternatively, regions can be specified as a character vector (e.g. "chr1:1200-1300") that can be coerced into aGRanges
object. Note that the reads are not trimmed to the boundaries of the specified ranges. As a result, returned positions will typically extend out of the specified regions. IfnAlnsToSample
is set to a non-zero value,regions
is ignored.- modbase
Character vector defining the modified base for each sample. If
modbase
is a named vector, the names should correspond to the names ofbamfiles
. Otherwise, it will be assumed that the elements are in the same order as the files inbamfiles
. Ifmodbase
has length 1, the same modified base will be used for all samples.- nAlnsToSample
A numeric scalar. If non-zero,
regions
is ignored and approximatelynAlnsToSample
randomly selected alignments onseqnamesToSampleFrom
are read from each of thebamfiles
. In order to make the results reproducible, make sure to set theRNGseed
argument in the providedBPPARAM
object (see below).- seqnamesToSampleFrom
A character vector with one or several sequence names (chromosomes) from which to sample alignments from (only used if
nAlnsToSample
is greater than zero).- seqinfo
NULL
or aSeqinfo
object containing information about the set of genomic sequences (chromosomes). Alternatively, a named numeric vector with genomic sequence names and lengths. Useful to set the sorting order of sequence names.- sequenceContextWidth, sequenceReference
Define the sequence context to be extracted around modified bases. By default (
sequenceContextWidth = 0
), no sequence context will be extracted, otherwise it will be returned inrowData(x)$sequenceContext
. SeeaddSeqContext
for details.- variantPositions
An optional
GPos
object with seqnames and coordinates of single nucleotide variant positions, to be used to construct read labels for allele-specific analysis. Ignored ifNULL
ornAlnsToSample > 0
(sampling-mode).- BPPARAM
A
BiocParallelParam
object that controls the number of parallel CPU threads to use for some of the steps inreadModBam()
. The default value (bpparam
) will select an appropriate value for the current environment, or the default parallel backend registered usingregister
. If randomly sampling reads (nAlnsToSample > 0
), make sure to set theRNGseed
argument when constructing theBPPARAM
object for reproducible results (see alsovignette("Random_Numbers", package = "BiocParallel")
).- verbose
Logical scalar. If
TRUE
, report on progress.
Value
A SummarizedExperiment
object
with genomic positions in rows and samples in columns. The assay
"mod_prob"
contains per-read modification probabilities,
with each column (sample) corresponding to a position-by-read
NaMatrix
.
See also
https://samtools.github.io/hts-specs/SAMtags.pdf describing the SAM ML and MM tags for base modifications.
Examples
modbamfile <- system.file("extdata", "6mA_1_10reads.bam",
package = "footprintR")
readModBam(bamfiles = modbamfile, regions = "chr1:6940000-6955000",
modbase = "a", verbose = TRUE)
#> ℹ extracting base modifications from modBAM files
#> ℹ opening input file /Users/runner/work/_temp/Library/footprintR/extdata/6mA_1_…
#> ✔ opening input file /Users/runner/work/_temp/Library/footprintR/extdata/6mA_1_…
#>
#> ℹ reading alignments overlapping 1 target
#> ✔ reading alignments overlapping 1 target [36ms]
#>
#> ℹ removed 1724 unaligned (e.g. soft-masked) of 25090 called bases
#> ℹ read 3 alignments
#> ℹ finding unique genomic positions...
#> ✔ finding unique genomic positions... [27ms]
#>
#> ℹ collapsed 11300 positions to 4772 unique ones
#> ✔ collapsed 11300 positions to 4772 unique ones [673ms]
#>
#> class: RangedSummarizedExperiment
#> dim: 4772 1
#> metadata(2): readLevelData variantPositions
#> assays(1): mod_prob
#> rownames(4772): chr1:6925830:- chr1:6925834:- ... chr1:6941622:-
#> chr1:6941631:-
#> rowData names(0):
#> colnames(1): s1
#> colData names(4): sample modbase n_reads readInfo