Given a k-mer length and the maximum number of allowed hits per k-mer, find all mappable regions in a genome.

getMappableRegions(
  genome,
  genomeIndex,
  kmerLength = 50,
  maxHits = 1,
  Ncpu = 2,
  quiet = TRUE
)

Arguments

genome

The genome sequence to work on. Either a BSgenome object, a character scalar with the name of an installed BSgenome or with a file path and name pointing to a fasta file with the genome sequence.

genomeIndex

character scalar with the path to the bowtie index and prefix to align against, in the form </path/to/index>/<prefix>, or the name of an installed Rbowtie index package created by the QuasR package for an installed BSgenome package.

kmerLength

numeric scalar specifying the k-mer length (width of overlapping windows in genome), usually set to the typical read length for which to get the mappable regions.

maxHits

numeric scalar specifying the maximum number of hits (matches) of a k-mer in the genome to be considered mappable.

Ncpu

numeric scalar specifying the number of CPU threads to use for alignments.

quiet

logical scalar indicating if progress information should be printed on the console.

Value

A GRanges object with mappable regions. All plus-strand sequences in genome of length kmerLength with their start (leftmost) position overlapping the GRanges object do not generate more than maxHits hits when aligned to the genome.

Details

Sequences of all overlapping windows are extracted from the genome and aligned to the provided genome index using bowtie with parameters -f -v 0 -a -B 1 -m maxHits. If no more than maxHits hits are found, the window is defined mappable.

See also

bowtie in package Rbowtie used by getMappableRegions to align reads to the genome; bowtie_build in package Rbowtie for indexing a genome.

Author

Michael Stadler

Examples

if (requireNamespace("Rbowtie", quietly = TRUE)) {
    library(Rbowtie)

    genomefile <- system.file("extdata", "getMappableRegions", "hg19sub.fa", package = "swissknife")
    indexdir <- tempfile()
    indexpre <- "index"
    indexname <- file.path(indexdir, indexpre)
    idx <- bowtie_build(genomefile, indexdir)

    mapgr <- getMappableRegions(genomefile, indexname, 50, quiet = FALSE)
    print(mapgr)
}
#> Will calculate mappability for 3 chromosomes (95000 base pairs)
#>     chr1...
#> done (2.1% non-mappable)
#>     chr2...
#> done (0.2% non-mappable)
#>     chr3...
#> done (0.1% non-mappable)
#> Total mappability: 99.2% in 27 segments
#> GRanges object with 27 ranges and 0 metadata columns:
#>        seqnames      ranges strand
#>           <Rle>   <IRanges>  <Rle>
#>    [1]     chr1      1-2158      +
#>    [2]     chr1   2619-2649      +
#>    [3]     chr1   2657-2666      +
#>    [4]     chr1   2693-2704      +
#>    [5]     chr1   2788-2818      +
#>    ...      ...         ...    ...
#>   [23]     chr3 23773-31758      +
#>   [24]     chr3 31761-33858      +
#>   [25]     chr3 33867-35084      +
#>   [26]     chr3 35099-44383      +
#>   [27]     chr3 44387-45000      +
#>   -------
#>   seqinfo: 3 sequences from an unspecified genome