Count pairs of modified bases by distance and modification state for bam files where modifications are indicated by sequence mismatches
Source:R/countMismatchStatePairs.R
countMismatchStatePairs.RdFor all pairs of bases with modification calls in a read, tabulate the number of pairs at a given distance with a given modification state.
Usage
countMismatchStatePairs(
bamfile,
bamFormat = "QuasR",
regions = ".",
sequenceContext = "GCH",
readBaseUnmod = "T",
readBaseMod = "C",
windowSize = 200,
minMapQ = 0,
minAlignedLength = 0,
seqinfo = NULL,
sequenceReference = NULL,
BPPARAM = MulticoreParam(4L),
verbose = FALSE
)Arguments
- bamfile
Character scalar giving the path to a
BAMfile, containing alignments with specific (e.g., C-to-T) mismatches. Thebamfilemust have an index.- bamFormat
A character scalar giving the format of
BAMfiles. Currently supported arBAMfiles that have been created with"QuasR"(usingqAlignin bisulfite mode) or"Bismark".- regions
A
GRangesobject specifying which genomic regions to extract the reads from. Alternatively, regions can be specified as a character vector (e.g. "chr1:1200-1300", "chr2:-6000", "chr1:10-", "chrM" or ".").- sequenceContext
A character scalar with an odd number of characters, specifying the genomic context on the plus strand, for which to report (mis-)matches. The string may contain IUPAC ambiguity codes, e.g.
sequenceContext="GCH"would correspond to all GpC dinucleotides, excluding C's that are in CpG dinucleotides. The call will be made by comparing the genomic base in the middle ofsequenceContextto the aligned base in the read and interpreted according toreadBaseUnmodandreadBaseMod.- readBaseUnmod, readBaseMod
Character scalars defining the read bases that are interpreted as unmodified or modified, respectively, when aligned to the middle base of
sequenceContext.- windowSize
Numeric scalar giving the maximum window size covering pairs of modified bases to consider in pair-counting mode. A window size of 1 corresponds to a single base, a size of 2 to directly adjacent bases, etc.
- minMapQ
Numeric scalar giving the minimal mapping quality to include alignments in pair-counting mode.
- minAlignedLength
Numeric scalar giving the minimal alignment length to include alignments in pair-counting mode.
- seqinfo
NULLor aSeqinfoobject containing information about the set of genomic sequences (chromosomes). Alternatively, a named numeric vector with genomic sequence names and lengths. Useful to set the sorting order of sequence names.- sequenceReference
A
BSgenomeobject, or a character scalar giving the path to a fasta formatted file with reference sequences, or aDNAStringSetobject. The sequence context (seesequenceContextWidthargument) will be extracted from these sequences.- BPPARAM
A
BiocParallelParamobject that controls the number of parallel CPU threads to use for decompressing bam records.- verbose
A logical scalar. If
TRUE, report on progress.
Examples
bamfile <- system.file("extdata", "BisSeq_quasr_single.bam",
package = "SingleMoleculeGenomicsIO")
reffile <- system.file("extdata", "reference.fa.gz",
package = "SingleMoleculeGenomicsIO")
tbl <- countMismatchStatePairs(bamfile = bamfile, bamFormat = "QuasR",
regions = "chr1:6940000-6955000",
sequenceContext = "C",
readBaseUnmod = "T", readBaseMod = "C",
windowSize = 200,
sequenceReference = reffile,
BPPARAM = BiocParallel::SerialParam(),
verbose = TRUE)
#> ℹ finding positions with C
#> ℹ opening input file /Users/runner/work/_temp/Library/SingleMoleculeGenomicsIO/extdata/BisSeq_quasr_single.bam using 1 thread
#> ℹ finding positions with C
#> ℹ counting state-pairs for alignments overlapping 1 region
#> ℹ finding positions with C
#> ℹ removed 0 unaligned (e.g. soft-masked) of 0 called bases
#> ℹ finding positions with C
#> ℹ read 184 alignments
#> ℹ finding positions with C
#> ✔ finding positions with C [89ms]
#>
tbl
#> DataFrame with 200 rows and 5 columns
#> S unmod_unmod unmod_mod mod_unmod mod_mod
#> <integer> <numeric> <numeric> <numeric> <numeric>
#> 1 1 3242 0 0 68
#> 2 2 691 10 5 0
#> 3 3 561 9 10 1
#> 4 4 563 12 15 0
#> 5 5 533 12 13 0
#> ... ... ... ... ... ...
#> 196 196 0 0 0 0
#> 197 197 0 0 0 0
#> 198 198 0 0 0 0
#> 199 199 0 0 0 0
#> 200 200 0 0 0 0