Create read alignments against reference genome and optional auxiliary targets if not yet existing. If necessary, also build target indices for the aligner.
qAlign(
sampleFile,
genome,
auxiliaryFile = NULL,
aligner = "Rbowtie",
maxHits = 1,
paired = NULL,
splicedAlignment = FALSE,
snpFile = NULL,
bisulfite = "no",
alignmentParameter = NULL,
projectName = "qProject",
alignmentsDir = NULL,
lib.loc = NULL,
cacheDir = NULL,
clObj = NULL,
checkOnly = FALSE,
geneAnnotation = NULL
)
The name of a text file listing input sequence files and sample names (see ‘Details’).
The reference genome for primary alignments, one of:
a string referring to a “BSgenome” package (e.g. “"BSgenome.Hsapiens.UCSC.hg19"”), which will be downloaded automatically from Bioconductor if not present
the name of a fasta sequence file containing one or several sequences (chromosomes) to be used as a reference. The aligner index will be created when necessary and stored in a default location (see ‘Details’).
The name of a text file listing sequences to be used as additional targets for alignment of reads not mapping to the reference genome (see ‘Details’).
selects the aligner program to be used for aligning the
reads. Currently, only “Rbowtie” and “Rhisat2” are supported,
which are R wrapper packages for ‘bowtie’ / ‘SpliceMap’ and
‘hisat2’, respectively (see Rbowtie-package
and
Rhisat2-package
packages).
sets the maximal number of allowed mapping positions
per read (default: 1). If a read produces more than maxHits
alignments, no alignments will be reported for it. In case of a
multi-mapping read, a single alignment is randomly selected.
defines the type of paired-end library and can be set to
one of no
(single read experiment, default), fr
(fw/rev),
ff
(fw/fw) or rf
(rev/fw).
If TRUE
, reads will be aligned using a
spliced aligner, depending on the value of aligner
described above:
aligner="Rhisat2"
: This is the recommended setting
for spliced alignments and will use hisat2 from the
Rhisat2-package
. See also the geneAnnotation
argument below for providing known exon-exon junctions.
aligner="Rbowtie"
: This is not recommended and only
available for legacy reasons. It will use SpliceMap to produce spliced
alignments (without using a database of known exon-exon junctions).
Compared to the alternative alignment modes (non-spliced or spliced using
Rhisat2
as aligner), this alignment mode is about ten-fold slower
and also less sensitive. Furthermore, SpliceMap can only be used for
reads with a minimal length of 50nt; SpliceMap ignores reads that are
shorter, and these reads will not be contained in the BAM file,
neither as mapped or unmapped reads.
The name of a text file listing single nucleotide polymorphisms to be used for allele-specific alignment and quantification (see ‘Details’).
For bisulfite-converted samples (Bis-seq), the type of bisulfite library (“dir” for directional libraries, “undir” for undirectional libraries).
An optional string containing command line
parameters to be used for the aligner, to overrule the default
alignment parameters used by QuasR
. Please use with caution;
some alignment parameters may break assumptions made by
QuasR
. Default parameters are listed in ‘Details’.
An optional name for the alignment project.
The directory to be used for storing alignments
(bam files). If set to NULL
(default), bam files will be
generated at the location of the input sequence files.
can be used to change the default library path of
R. The library path is used by QuasR
to store aligner index
packages created from BSgenome
reference genomes.
specifies the location to store (potentially huge)
temporary files. If set to NULL
(default), the temporary
directory of the current R session as returned by tempdir()
will be used.
A cluster object, created by the package parallel, to enable parallel processing and speed up the alignment process.
If TRUE
, prevents the automatic creation of
alignments or aligner indices. This allows to quickly check for missing
alignment files without starting the potentially long process of
their creation. In the case of missing alignments or indices, an
exception is thrown.
Only used if aligner
is "Rhisat2"
.
The path to either a gtf file or a sqlite database generated by exporting
a TxDb
object. This file is used to generate a splice site file
for Rhisat2
, that will be used to guide the spliced alignment.
Please note that if using an sqlite database file, do not use the one
contained in the installed package folder of a TxDb
package.
QuasR
(through Rhisat2
) creates additional files in the
folder which would interfere with the loading of the TxDb package.
A qProject
object.
Before generating new alignments, qAlign
looks for previously
generated alignments as well as for an aligner index. If no aligner
index exists, it will be automatically created and stored in the same
directory as the provided fasta file, or as an R package in the case
of a BSgenome reference. The name of this R package will be the same
as the BSgenome package name, with an additional suffix from the
aligner (e.g. BSgenome.Hsapiens.UCSC.hg19.Rbowtie
). The
generated bam files contain both aligned und unaligned reads. For
paired-end samples, by default no alignments will be reported for
read pairs where only one of the reads could be aligned.
sampleFile
is a tab-delimited text file listing all the input
sequences to be included in a given analysis. The file has either two
(single-end) or three columns (paired-end). The first row contains the
column names, and additional rows contain relative or absolute path
and name of input sequence file(s), as well as the according sample
name. Three input file formats are supported (fastq, fasta and bam).
All input files in one sampleFile
need to be in the same
format, and are recognized by their extension (.fq, .fastq, .fa,
.fasta, .fna, .bam), in raw or compressed form (e.g. .fastq.gz). If
bam files are provided, then no alignments are generated by
qAlign
, and the alignments contained in the bam files will be
used instead.
The column names in sampleFile
have to match to the ones in the
examples below, for a single-read experiment:
FileName | SampleName |
chip_1_1.fq.bz2 | Sample1 |
chip_2_1.fq.bz2 | Sample2 |
and for a paired-end experiment:
FileName1 | FileName2 | SampleName |
rna_1_1.fq.bz2 | rna_1_2.fq.bz2 | Sample1 |
rna_2_1.fq.bz2 | rna_2_2.fq.bz2 | Sample2 |
The “SampleName” column is the human-readable name for each
sample that will be used as sample labels. Multiple sequence files may
be associated to the same sample name, which instructs QuasR
to
combine those files.
auxiliaryFile
is a tab-delimited text file listing one or
several additional target sequence files in fasta format. Reads that
do not map against the reference genome will be aligned against each
of these target sequence files. The first row contains the column
names which have to match to the ones in the example below:
FileName | AuxName |
NC_001422.1.fa | phiX174 |
snpFile
is a tab-delimited text file without a header and
contains four columns with chromosome name, position, reference allele
and alternative allele, as in the example below:
chr1 | 8596 | G | A |
chr1 | 18443 | G | A |
chr1 | 18981 | C | T |
chr1 | 19341 | G | A |
The reference and alternative alleles will be injected into the
reference genome, resulting in two separate genomes. All reads will be
aligned separately to both of these genomes, and the alignments will
be combined, only retaining the best alignment for each read. In the
final alignment, each read will be marked with a tag that classifies
it into reference (R
), alternative (A
) or unknown
(U
), if the reads maps equally well to both genomes.
If bisulfite
is set to “dir” or “undir”, reads
will be C-to-T converted and aligned to a similarly converted genome.
If alignmentParameter
is NULL
(recommended),
qAlign
will select default parameters that are suitable for the
experiment type. Please note that for bisulfite or allele-specific
experiments, each read is aligned multiple times, and resulting
alignments need to be combined. This requires special settings for the
alignment parameters that are not recommended to be changed. For
‘simple’ experiments (neither bisulfite, allele-specific, nor
spliced), alignments are generated using the parameters -m
maxHits --best --strata
. This will align reads with up to
“maxHits” best hits in the genome and selects one of them randomly.
qProject
,
makeCluster
from package parallel,
Rbowtie-package
package,
Rhisat2-package
package