Perform CellAnova Batch correction — cellanova_calc

CellAnova (Zhaojun Zhang et al, 2024 Nat Biotech) is a new method that can remove/mitigate batch effect in single-cell data and return users with a "corrected" data matrix (instead of just a low-dim embedding). Here we implement the 2nd step of the CellAnova. Specifically, this function is an R-implemention of the CellAnova python package's calc_BE() function (https://github.com/Janezjz/cellanova).

Usage

cellanova_calc_BE(
  object = NULL,
  assay = NULL,
  layer = "scale.data",
  integrate_key = NULL,
  features = NULL,
  control_dict = NULL,
  reduction = NULL,
  var_cutoff = 0.9,
  k_max = 1500,
  k_select = NULL,
  new.assay.name = "CORRECTED",
  verbose = TRUE
)

Arguments

object: A Seurat object
assay: the name of the assay to perform the CellAnova correction.
layer: the name of the layer to be used to correct for the batch effect. Should be scale.data.
integrate_key: A string indicating the smallest batch unit in the meta-data (e.g., library, donor, etc.), which will be used for integration later.
features: features to compute corrected expression for. Defaults to the variable features set in the assay specified.
control_dict: A list indicating the control-group assignment of the controls. The name of each element in the list should correspond to the batch name in the 'integrate_key' column.
reduction: the name of the DimReduc object we use as the integrated embeddings. Should be from methods like Harmony or Seurat-integration methods (e.g., CCA).
var_cutoff: the fraction of explained variance to determine the optimal value of k in truncated SVD when calculating the basis of the batch effect. Default is 0.9.
k_max: the maximum of singular values and vectors to compute.
k_select: the user-defined number of singular values and vectors to compute (override var_cutoff and k_max). Default is NULL.
new.assay.name: the name for the new assay to store the corrected expression matrix
verbose: display progress + messages

Value

Returns a Seurat object with a new assay added containing the batch-corrected expression matrix

Details

This function takes a Seurat object and its pre-computed integrated embedding from methods like Harmony or Seurat-CCA, a batch-effect index, and a case-control index as input, to estimate the batch effect from the control samples, and correct for it from the full original expression data. Most of the procedures are kept the same, with the following modifications:

currently we only support one control group.
we have additionally implemented a future_lapply() and a more efficient regression framework to enhance the efficiency.
the procedure can be done to a "sketched" data and later project to the whole data (for the purposes of efficiency and data balance).