Given a k-mer length and the maximum number of allowed hits per k-mer, find all mappable regions in a genome.
getMappableRegions(
genome,
genomeIndex,
kmerLength = 50,
maxHits = 1,
Ncpu = 2,
quiet = TRUE
)
The genome sequence to work on. Either a BSgenome
object, a character
scalar with the name of an installed BSgenome
or with a file path and name pointing to a fasta file with the genome sequence.
character
scalar with the path to the bowtie index
and prefix to align against, in the form </path/to/index>/<prefix>
,
or the name of an installed Rbowtie
index package created by the
QuasR package for an installed BSgenome
package.
numeric
scalar specifying the k-mer length (width of
overlapping windows in genome
), usually set to the typical read
length for which to get the mappable regions.
numeric
scalar specifying the maximum number of hits
(matches) of a k-mer in the genome
to be considered mappable.
numeric
scalar specifying the number of CPU threads to use
for alignments.
logical
scalar indicating if progress information should
be printed on the console.
A GRanges
object with mappable regions.
All plus-strand sequences in genome
of length kmerLength
with their start (leftmost) position overlapping the GRanges
object
do not generate more than maxHits
hits when aligned to the genome.
Sequences of all overlapping windows are extracted from the genome
and aligned to the provided genome index using bowtie
with parameters -f -v 0 -a -B 1 -m maxHits
. If no more
than maxHits
hits are found, the window is defined mappable.
bowtie
in package Rbowtie used by
getMappableRegions
to align reads to the genome;
bowtie_build
in package Rbowtie for
indexing a genome.
if (requireNamespace("Rbowtie", quietly = TRUE)) {
library(Rbowtie)
genomefile <- system.file("extdata", "getMappableRegions", "hg19sub.fa", package = "swissknife")
indexdir <- tempfile()
indexpre <- "index"
indexname <- file.path(indexdir, indexpre)
idx <- bowtie_build(genomefile, indexdir)
mapgr <- getMappableRegions(genomefile, indexname, 50, quiet = FALSE)
print(mapgr)
}
#> Will calculate mappability for 3 chromosomes (95000 base pairs)
#> chr1...
#> done (2.1% non-mappable)
#> chr2...
#> done (0.2% non-mappable)
#> chr3...
#> done (0.1% non-mappable)
#> Total mappability: 99.2% in 27 segments
#> GRanges object with 27 ranges and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr1 1-2158 +
#> [2] chr1 2619-2649 +
#> [3] chr1 2657-2666 +
#> [4] chr1 2693-2704 +
#> [5] chr1 2788-2818 +
#> ... ... ... ...
#> [23] chr3 23773-31758 +
#> [24] chr3 31761-33858 +
#> [25] chr3 33867-35084 +
#> [26] chr3 35099-44383 +
#> [27] chr3 44387-45000 +
#> -------
#> seqinfo: 3 sequences from an unspecified genome