This function uses sketching methods to downsample high-dimensional single-cell RNA expression data within specified groups/categories, which can help with scalability for large datasets while maintaining representation across different cell types or conditions.
Usage
SketchDataByGroup(
object,
group.by = NULL,
assay = NULL,
ncells = 1000L,
sketched.assay = "sketch",
method = c("Uniform", "LeverageScore"),
var.name = "leverage.score",
cells = NULL,
over.write = FALSE,
seed = 123L,
cast = "dgCMatrix",
verbose = TRUE,
features = NULL,
min.cells.per.group = 10L,
...
)Arguments
- object
A Seurat object.
- group.by
A metadata column name to group cells by before sketching. If NULL, falls back to standard sketching without grouping. Default is NULL.
- assay
Assay name. Default is NULL, in which case the default assay of the object is used.
- ncells
A positive integer, named vector, or list specifying the number of cells to sample.
If a single integer: same number of cells sampled from each group and layer
If a named vector with group names: specific number per group
If a named list with group names containing layer vectors: specific number per group per layer
Default is 1000.
- sketched.assay
Sketched assay name. A sketch assay is created or overwritten with the sketch data. Default is 'sketch'.
- method
Sketching method to use. Can be 'LeverageScore' or 'Uniform'. Default is 'Uniform'.
- var.name
A metadata column name to store the leverage scores. Default is 'leverage.score'.
- cells
A vector that contains the IDs of the cells that the user wants to keep. If this is set, these user-defined cells will be directly used to generate the sketch, ignoring group.by parameter.
- over.write
Whether to overwrite existing column in the metadata. Default is FALSE.
- seed
A positive integer for the seed of the random number generator. Default is 123.
- cast
The type to cast the resulting assay to. Default is 'dgCMatrix'.
- verbose
Print progress and diagnostic messages. Default is TRUE.
- features
A character vector of feature names to include in the sketched assay.
- min.cells.per.group
Minimum number of cells required per group to perform sketching. Groups with fewer cells will be included entirely. Default is 10.
- ...
Arguments passed to other methods
Value
A Seurat object with the sketched data added as a new assay. The metadata will contain information about which groups each sketched cell belongs to.
Details
When group.by is specified, the function performs the following steps:
Splits cells into groups based on the metadata column
Calculates leverage scores (if method = 'LeverageScore') within each group
Samples the specified number of cells from each group
Combines all sampled cells into the sketched assay
This approach ensures that rare cell types or conditions are not underrepresented in the final sketched dataset.
Examples
if (FALSE) { # \dontrun{
# Basic usage with grouping by cell type
sketched_obj <- SketchDataByGroup(
object = seurat_obj,
group.by = "cell_type",
ncells = 500
)
# Different number of cells per group
sketched_obj <- SketchDataByGroup(
object = seurat_obj,
group.by = "condition",
ncells = c("control" = 1000, "treatment" = 800)
)
# Without grouping (falls back to original behavior)
sketched_obj <- SketchDataByGroup(
object = seurat_obj,
ncells = 5000
)
} # }