This function sets the names of the transcript and gene ID columns of the gtf file to "transcript" and "gene", removes version tags of the transcripts/genes and retains only the "exon" entries. The purpose is to make the file amenable to plotting with Gviz, using the plotGeneRegion function.

prepareGTF(
  gtf,
  transcriptIdColumn = "transcript_id",
  geneIdColumn = "gene_id",
  geneSymbolColumn = "gene_name"
)

Arguments

gtf

Character scalar, path to gtf file (tested with Ensembl/Gencode files).

transcriptIdColumn

Character scalar, the column in the gtf file that contains the transcript ID.

geneIdColumn

Character scalar, the column in the gtf file that contains the gene ID.

geneSymbolColumn

Character scalar, the column in the gtf file that contains the gene symbol (if available). Set to "" if not available (in which case the gene IDs will be used in its place).

Author

Charlotte Soneson

Examples

gtf <- prepareGTF(gtf = system.file("extdata/plotGeneRegion/mm10_ensembl98.gtf",
                                    package = "swissknife"))