The qProject class is a container for the meta-data (e.g. sample names, paths and names of sequence and alignment files) associated with a high-throughput sequencing experiment analyzed with QuasR.

Details

The qProject class is returned by qAlign and stores all information on a high-throughput sequencing experiment analyzed with QuasR. qProject objects can be conveniently passed to ‘q’-functions (function name starting with the letter ‘q’). The information is stored in the following slots:

reads

a 'data.frame' with sequence read files.

reads_md5subsum

a 'data.frame' with fingerprints for sequence read files.

alignments

a 'data.frame' with alignment files.

samplesFormat

a 'character(1)' specifying the format of input files.

genome

a 'character(1)' specifying the reference genome.

genomeFormat

a 'character(1)' specifying the format of the reference genome.

aux

a 'data.frame' with auxiliary reference sequence files.

auxAlignments

a 'data.frame' with alignment files for auxiliary reference sequence files.

aligner

a 'character(1)' specifying the aligner.

maxHits

a 'numeric(1)' specifying the maximum number of alignments per sequence.

paired

a 'character(1)' specifying the paired-type; one of "no", "fr", "rf", "ff".

splicedAlignment

a 'logical(1)'; TRUE when performing spliced-alignments.

snpFile

a 'character(1)' with a file name containing SNP information.

bisulfite

a 'character(1)' defining the bisulfite type; one of "no", "dir", "undir".

alignmentParameter

a 'character(1)' with aligner command line parameters.

projectName

a 'character(1)' with the project name.

alignmentsDir

a 'character(1)' with the directory to be used to store alignment files.

lib.loc

a 'character(1)' with the library directory to use for installing of alignment index packages.

cacheDir

a 'character(1)' with a directory to use for temporary files.

alnModeID

a 'character(1)' used internally to indicate the alignment mode.

Accessors

In the following code snippets, x is a qProject object.

length(x)

: Gets the number of input files.

genome(x)

: Gets the reference genome as a 'character(1)'. The type of genome is stored as an attribute in attr(genome(x),"genomeFormat"): "BSgenome" indicates that genome(x) refers to the name of a BSgenome package, "file" indicates that it contains the path and filename of a genome in FASTA format.

auxiliaries(x)

: Gets a data.frame with auxiliary target sequences, with one row per auxiliary target, and columns "FileName" and "AuxName".

alignments(x)

: Gets a list with two elements "genome" and "aux". alignments(x)$genome contains a data.frame with length(x) rows and the columns "FileName" (containing the path to bam files with genomic alignments) and "SampleName". alignments(x)$aux contains a data.frame with one row per auxiliary target sequence (with auxiliary names as row names), and length(x) columns.

Subsetting

In the following code snippets, x is a qProject object.

x[i]

: Get qProject object instance with i input files, where i can be an NA-free logical, numeric, or character vector.

See also

Author

Anita Lerch, Dimos Gaidatzis and Michael Stadler

Examples

# copy example data to current working directory
file.copy(system.file(package="QuasR", "extdata"), ".", recursive=TRUE)
#> [1] TRUE

# create alignments
sampleFile <- "extdata/samples_chip_single.txt"
genomeFile <- "extdata/hg19sub.fa"
auxFile <- "extdata/auxiliaries.txt"

proj <- qAlign(sampleFile, genomeFile, auxiliaryFile=auxFile)
#> alignment files missing - need to:
#>     create 2 auxiliary alignment(s)
#> Creating an Rbowtie index for /private/var/folders/2s/h6hvv9ps03xgz_krkkstvq_r0000gn/T/RtmpHtA1jk/file5ddc16c1b3df/reference/extdata/NC_001422.1.fa
#> Finished creating index
#> Testing the compute nodes...
#> OK
#> Loading QuasR on the compute nodes...
#> preparing to run on 1 nodes...
#> done
#> Available cores:
#> nodeNames
#> Mac-1740133007481.local 
#>                       1 
#> Performing auxiliary alignments for 2 samples. See progress in the log file:
#> /private/var/folders/2s/h6hvv9ps03xgz_krkkstvq_r0000gn/T/RtmpHtA1jk/file5ddc16c1b3df/reference/QuasR_log_5ddc8051f7f.txt
#> Auxiliary alignments have been created successfully
#> 
proj
#> Project: qProject
#>  Options   : maxHits         : 1
#>              paired          : no
#>              splicedAlignment: FALSE
#>              bisulfite       : no
#>              snpFile         : none
#>              geneAnnotation  : none
#>  Aligner   : Rbowtie v1.47.0 (parameters: -m 1 --best --strata)
#>  Genome    : /private/var/folders/2s/h6hvv9ps03xgz_krkkstvq.../hg19sub.fa (file)
#> 
#>  Reads     : 2 files, 2 samples (fastq format):
#>    1. chip_1_1.fq.bz2  Sample1 (phred33)
#>    2. chip_2_1.fq.bz2  Sample2 (phred33)
#> 
#>  Genome alignments: directory: same as reads
#>    1. chip_1_1_5ddc61d5c573.bam
#>    2. chip_2_1_5ddc19963333.bam
#> 
#>  Aux. alignments: 1 file, directory: same as reads
#>    a. /private/var/folders/2s/h6hvv9ps03xgz_krkkstvq_.../NC_001422.1.fa  phiX174
#>      1. chip_1_1_5ddc856cef6.bam 
#>      2. chip_2_1_5ddc7b2d7cc0.bam
#> 

# alignment statistics using a qProject
alignmentStats(proj)
#>                 seqlength mapped unmapped
#> Sample1:genome      95000   2339      258
#> Sample2:genome      95000   3609      505
#> Sample1:phiX174      5386    251        7
#> Sample2:phiX174      5386    493       12

# alignment statistics using bam files
alignmentStats(alignments(proj)$genome$FileName)
#>                           seqlength mapped unmapped
#> chip_1_1_5ddc61d5c573.bam     95000   2339      258
#> chip_2_1_5ddc19963333.bam     95000   3609      505
alignmentStats(unlist(alignments(proj)$aux))
#>                           seqlength mapped unmapped
#> chip_1_1_5ddc856cef6.bam       5386    251        0
#> chip_2_1_5ddc7b2d7cc0.bam      5386    493        0