Skip to contents

This function uses sketching methods to downsample high-dimensional single-cell RNA expression data within specified groups/categories, which can help with scalability for large datasets while maintaining representation across different cell types or conditions.

Usage

SketchDataByGroup(
  object,
  group.by = NULL,
  assay = NULL,
  ncells = 1000L,
  sketched.assay = "sketch",
  method = c("Uniform", "LeverageScore"),
  var.name = "leverage.score",
  cells = NULL,
  over.write = FALSE,
  seed = 123L,
  cast = "dgCMatrix",
  verbose = TRUE,
  features = NULL,
  min.cells.per.group = 10L,
  ...
)

Arguments

object

A Seurat object.

group.by

A metadata column name to group cells by before sketching. If NULL, falls back to standard sketching without grouping. Default is NULL.

assay

Assay name. Default is NULL, in which case the default assay of the object is used.

ncells

A positive integer, named vector, or list specifying the number of cells to sample.

  • If a single integer: same number of cells sampled from each group and layer

  • If a named vector with group names: specific number per group

  • If a named list with group names containing layer vectors: specific number per group per layer

Default is 1000.

sketched.assay

Sketched assay name. A sketch assay is created or overwritten with the sketch data. Default is 'sketch'.

method

Sketching method to use. Can be 'LeverageScore' or 'Uniform'. Default is 'Uniform'.

var.name

A metadata column name to store the leverage scores. Default is 'leverage.score'.

cells

A vector that contains the IDs of the cells that the user wants to keep. If this is set, these user-defined cells will be directly used to generate the sketch, ignoring group.by parameter.

over.write

Whether to overwrite existing column in the metadata. Default is FALSE.

seed

A positive integer for the seed of the random number generator. Default is 123.

cast

The type to cast the resulting assay to. Default is 'dgCMatrix'.

verbose

Print progress and diagnostic messages. Default is TRUE.

features

A character vector of feature names to include in the sketched assay.

min.cells.per.group

Minimum number of cells required per group to perform sketching. Groups with fewer cells will be included entirely. Default is 10.

...

Arguments passed to other methods

Value

A Seurat object with the sketched data added as a new assay. The metadata will contain information about which groups each sketched cell belongs to.

Details

When group.by is specified, the function performs the following steps:

  1. Splits cells into groups based on the metadata column

  2. Calculates leverage scores (if method = 'LeverageScore') within each group

  3. Samples the specified number of cells from each group

  4. Combines all sampled cells into the sketched assay

This approach ensures that rare cell types or conditions are not underrepresented in the final sketched dataset.

Examples

if (FALSE) { # \dontrun{
# Basic usage with grouping by cell type
sketched_obj <- SketchDataByGroup(
  object = seurat_obj,
  group.by = "cell_type",
  ncells = 500
)

# Different number of cells per group
sketched_obj <- SketchDataByGroup(
  object = seurat_obj,
  group.by = "condition",
  ncells = c("control" = 1000, "treatment" = 800)
)

# Without grouping (falls back to original behavior)
sketched_obj <- SketchDataByGroup(
  object = seurat_obj,
  ncells = 5000
)
} # }