From a transcript database package (TxDb), extract exonic and gene body ranges for use with EISA. These regions can be used to quantify RNA-seq alignments in exons and gene bodies, respectively. Intronic counts can then be obtained from the difference between gene bodies and exonic region counts.

getRegionsFromTxDb(txdb, exonExt = 10L, strandedData = TRUE)

Arguments

txdb

a TxDb or an EnsDb object with the transcript annotations.

exonExt

numeric (default = 10L). Exonic ranges will be extended on either side by this many nucleotides, in order to avoid "bleed-over" of exonic alignments into adjacent intronic regions.

strandedData

logical(1). If TRUE, the RNA-seq data is assumed to be strand-specific, and therefore only overlapping genes that are on the same strand will be filtered out. If FALSE, also genes overlapping on opposite strands will be filtered out.

Value

a list with elements "exons" and "genebodies", containing named GenomicRanges objects with ranges for exons and gene bodies, respectively.

Details

The exonic regions are generated as follows:

  1. extract exons by gene from the txdb

  2. extend each exon by exonExt

  3. combine overlapping exons within each gene

  4. create gene body ranges from the most extreme exonic coordinates

  5. filter out genes that have only a single exon (no intron), have exons on more than a single chromosome or on both strands, or that overlap other genes

See also

TxDb for details on TxDb objects and the txdbmaker package for how to create them, e.g. from .gtf files.

Author

Michael Stadler

Examples

if (requireNamespace("AnnotationDbi", quietly = TRUE)) {
    txdb <- AnnotationDbi::loadDb(system.file("extdata", "hg19sub.sqlite", package = "eisaR"))
    regL <- getRegionsFromTxDb(txdb)
    lengths(regL)
}
#> Loading required package: GenomicFeatures
#> Loading required package: AnnotationDbi
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> extracting exon coordinates
#> total number of genes/exons: 12/32
#> removing overlapping/single-exon/ambiguous genes (8)
#> creating filtered regions for 4 genes (33.3%) with 20 exons (62.5%)
#>      exons genebodies 
#>         20          4