Perform subsampling with the scSampler
python package.
scsampler(mat, N, random_split = 1, seed = 0)
m x n matrix. Samples (the dimension along which to subsample) should be in the rows, features in the columns.
Numeric scalar, the number of samples to retain.
Numeric scalar, the number of parts to randomly split the data into before subsampling within each part. A larger value will speed up computations, but give less optimal results.
Numeric scalar, passed to scsampler
to seed the random
number generator.
A numeric vector with indices to retain.
The first time this function is run, it will create a conda environment
containing the scSampler
package.
This is done via the basilisk
R/Bioconductor package - see the
documentation for that package for troubleshooting.
Song et al (2022): scSampler: fast diversity-preserving subsampling of large-scale single-cell transcriptomic data. bioRxiv doi:10.1101/2022.01.15.476407