Skip to contents

For each bam file in infiles, parse alignments, calculate read statistics and write the alignment to the corresponding output file from outfiles if the read passes all criteria defined by the filtering arguments. Filters are processed hierarchically: If a read does not pass a given filter, the remaining filters will not be examined and the processing continues with the next read. The filters are examined in this order: keepUnmapped, keepSecondary, keepSupplementary, minReadLength, minAlignedLength, minAlignedFraction, minQscore, maxFracLowConf, maxEntropy.

Usage

filterReadsBam(
  infiles,
  outfiles,
  modbase,
  indexOutfiles = TRUE,
  overwriteOutfiles = FALSE,
  keepUnmapped = TRUE,
  keepSecondary = TRUE,
  keepSupplementary = TRUE,
  minReadLength = 0,
  minAlignedLength = 0,
  minAlignedFraction = 0,
  minQscore = 0,
  maxFracLowConf = 1,
  maxEntropy = Inf,
  LowConf = 0.7,
  BPPARAM = MulticoreParam(4L, RNGseed = 42L),
  verbose = FALSE
)

Arguments

infiles

Character vector with name(s) of the input bam file(s).

outfiles

Character vector with name(s) of the output bam file(s). Needs to have the same length as infiles.

modbase

Character scalar defining the modified base to analyze (used by maxEntropy and maxFracLowConf).

indexOutfiles

Logical scalar. If TRUE (the default) create a bam index file (.bai file) for each of the generated outfiles.

overwriteOutfiles

Logical scalar. If FALSE (the default), existing outfiles will not be overwritten and the function will abort with an error message.

keepUnmapped, keepSecondary, keepSupplementary

Logical scalars indicating whether to keep unmapped, secondary or supplementary alignments.

minReadLength

A numeric scalar representing the smallest acceptable read length. Reads that are shorter than this value will be filtered out.

minAlignedLength

A numeric scalar representing the smallest acceptable aligned length. Reads with aligned length shorter than this value will be filtered out.

minAlignedFraction

A numeric scalar representing the smallest acceptable aligned fraction of a read. Reads where the aligned fraction is smaller than this value will be filtered out.

minQscore

A numeric scalar representing the smallest acceptable read-level Qscore. Reads with Qscore below this value will be filtered out.

maxFracLowConf

A numeric scalar representing the maximally acceptable fraction of low-confidence modified base calls in a read. Reads with a fraction of low confidence calls greater than this value will be filtered out.

maxEntropy

A numeric scalar representing the largest acceptable read-level entropy. Reads with entropy above this value will be filtered out. A value of Inf deactivates the entropy filter.

LowConf

A numeric scalar with the minimum call confidence below which calls are considered "low confidence".

BPPARAM

A BiocParallelParam object that controls the number of parallel CPU threads to use for some of the steps in filterReadsBam(). The default value is (MulticoreParam(4L, RNGseed = 42L)).

verbose

Logical scalar. If TRUE, report on progress.

Value

filterReadsBam is called for its side effect of generating new bam files containing the subset of bam records from input bam files that pass all filtering criteria. In addition, it returns a data.frame with one row per infiles giving the numbers of bam records that were read in total, that were retained in the outfiles and that were filtered-out by reason of exclusion.

Author

Michael Stadler

Examples

modbamfiles <- system.file("extdata", c("6mA_1_10reads.bam", "6mA_2_10reads.bam"),
                           package = "footprintR")
filtbamfiles <- tempfile(fileext = rep(".bam", length(modbamfiles)))
res <- filterReadsBam(infiles = modbamfiles, outfiles = filtbamfiles,
                      modbase = "a", indexOutfiles = FALSE, minReadLength = 6746,
                      minAlignedLength = 6896, minAlignedFraction = 0.56,
                      minQscore = 9.7, maxFracLowConf = 0.11, maxEntropy = 0.29,
                      BPPARAM = BiocParallel::SerialParam(), verbose = TRUE)
#>  opening input file /Users/runner/work/_temp/Library/footprintR/extdata/6mA_1_10reads.bam using 1 thread
#> ⠙ 0.000 Mio. genomic positions processed (0.001 Mio./s) [2ms]

#>  done filtering: retained 7 of 10 records (70%)
#> ⠙ 0.000 Mio. genomic positions processed (0.001 Mio./s) [2ms]

#>  opening input file /Users/runner/work/_temp/Library/footprintR/extdata/6mA_2_10reads.bam using 1 thread
#> ⠙ 0.000 Mio. genomic positions processed (0.001 Mio./s) [2ms]

#> ⠙ 0.0 thousand records processed (16.6 /s) [61ms]
#>  done filtering: retained 7 of 10 records (70%)
#> ⠙ 0.0 thousand records processed (16.6 /s) [61ms]

res
#>   sample                                                                infile
#> 1     s1 /Users/runner/work/_temp/Library/footprintR/extdata/6mA_1_10reads.bam
#> 2     s2 /Users/runner/work/_temp/Library/footprintR/extdata/6mA_2_10reads.bam
#>                                                                             outfile
#> 1 /var/folders/2s/h6hvv9ps03xgz_krkkstvq_r0000gn/T//RtmpEgw7vA/file388c4244e3f3.bam
#> 2 /var/folders/2s/h6hvv9ps03xgz_krkkstvq_r0000gn/T//RtmpEgw7vA/file388c38d18882.bam
#>   total retained filtered_unmapped filtered_secondary filtered_supplementary
#> 1    10        7                 0                  0                      0
#> 2    10        7                 0                  0                      0
#>   filtered_minReadLength filtered_minAlignedLength filtered_minAlignedFraction
#> 1                      0                         1                           1
#> 2                      1                         0                           0
#>   filtered_minQscore filtered_maxFracLowConf filtered_maxEntropy
#> 1                  0                       0                   1
#> 2                  1                       1                   0
unlink(filtbamfiles)