Perform subsampling with the scSampler python package.

scsampler(mat, N, random_split = 1, seed = 0)

Arguments

mat

m x n matrix. Samples (the dimension along which to subsample) should be in the rows, features in the columns.

N

Numeric scalar, the number of samples to retain.

random_split

Numeric scalar, the number of parts to randomly split the data into before subsampling within each part. A larger value will speed up computations, but give less optimal results.

seed

Numeric scalar, passed to scsampler to seed the random number generator.

Value

A numeric vector with indices to retain.

Details

The first time this function is run, it will create a conda environment containing the scSampler package. This is done via the basilisk R/Bioconductor package - see the documentation for that package for troubleshooting.

References

Song et al (2022): scSampler: fast diversity-preserving subsampling of large-scale single-cell transcriptomic data. bioRxiv doi:10.1101/2022.01.15.476407

Author

Charlotte Soneson, Michael Stadler

Examples

if (!(Sys.info()["sysname"] == "Darwin" && Sys.info()["machine"] == "arm64")) {
    x <- matrix(rnorm(500), nrow = 100)
    scsampler(mat = x, N = 10)
}