Perform preprocessing procedures for a Seurat object to prepare for the sample-level analysis

This is a wrapper function to apply landmark sketching, training data sketching (optional), PLS learning, and weighted nearest neighbor (WNN) workflow to a single-cell Seurat object to get the necessary components for sample-level analyses later. The output object can be directly used by GenerateSampleObject to obtain a sample-level matrix.

Usage

PrepareSampleObject(
  object = NULL,
  assay = NULL,
  add.hvg = TRUE,
  group.by.CorTest = NULL,
  Y = NULL,
  sketch.training = TRUE,
  group.by.Sketch = NULL,
  ncells.per.group = 1000,
  training.assay.name = "TRAINING",
  training.sketch.method = "Uniform",
  ncells.landmark = 2000,
  landmark.assay.name = "LANDMARK",
  landmark.sketch.method = "LeverageScore",
  ncomp = 10,
  pls.function = c("plsr", "spls", "cppls"),
  pls.reduction.name = "pls",
  k.nn = 5,
  name.reduction.1 = "pca",
  dims.reduction.1 = 1:30,
  weighted.nn.name = "weighted.nn",
  fix.wnn.weights = c(0.5, 0.5),
  rm.training.assay = FALSE,
  max_core = 1,
  future.memory.per.core = 2000,
  verbose = TRUE,
  ...
)

Arguments

object: a Seurat object.
assay: the assay to perform the analyses on
add.hvg: a boolen to specify whether the function runs additional correlation test between features and response (Y) to obtain more response-relevant features to be included in VariableFeatures and used for PLS learning. See QuickCorTest for details. Default is TRUE.
group.by.CorTest: A metadata column name to group cells by before QuickCorTest. If NULL, falls back to QuickCorTest without grouping. Default is NULL.
Y: A metadata column name of responses. Will also be used as the responses for PLS learning.
sketch.training: a boolen to specify whether the function performs sketching for the Seurat object to get a training subset for PLS learning (the PLS learned from subset will be projected to the full data). For efficiency, it is recommended to do so when the Seurat object is very large. If the 'assay' being specified for this Seurat object is an on-disk assay, this parameter must be set to TRUE. Default is FALSE.
group.by.Sketch: A metadata column name to group cells by before sketching. If NULL, falls back to standard sketching without grouping. Default is NULL.
ncells.per.group: A positive integer, named vector, or list specifying the number of cells to sample. Default is 1000 per category in 'group.by.Sketch'. See SketchDataByGroup for detailed usage.
training.assay.name: The assay name of the training data.
training.sketch.method: Sketching method to use for the training subset. Can be 'LeverageScore' or 'Uniform'. Default is 'Uniform' because for training we want to obtain an unbiased subset of the original data. See LeverageScore for details.
ncells.landmark: A positive integer specifying the number of 'landmark cells'. The function will generate a new assay for these 'landmark cells' (please do not confuse it with the training subset). The landmark cells should be a subset that covers the diverse cell types and cell states of the data. Default is 2000.
landmark.assay.name: The assay name of the landmark cells. Default is "LANDMARK".
landmark.sketch.method: Sketching method to use for the landmark cells. Can be 'LeverageScore' or 'Uniform'. Default is 'LeverageScore' because we want the landmark cells to be as diverse/representative as possible. See LeverageScore for details.
ncomp: Number of components to compute
pls.function: PLS function from pls package to run (options: plsr, spls, cppls)
pls.reduction.name: PLS dimensional reduction name
k.nn: The number of nearest neighbors to compute for each modality
name.reduction.1: The name of the DimReduc to use as the 1st embedding (the 2nd is the PLS embedding) in the WNN process.
dims.reduction.1: The dimensions for reduction.1 to use during the WNN process.
weighted.nn.name: Multimodal neighbor object name
fix.wnn.weights: Pre-specified modality weights. If provided, skips the calculation and uses these weights directly. Should be a list with the same length as reduction.list.
rm.training.assay: Whether to remove the training assay after running PrepareSampleObject(). This is used to reduce the memory usage when the seurat object is large (e.g., a large on-disk object). Default is FALSE.
max_core: The number of cores to use for parallelization when running the WNN process (but not other processes). Note that if the user has already set the "future::plan()" before running the function, it will ignore this parameter and respect user's future plan (and the plan() will be applied to other functions being called, including ScaleData()). Default is 1 for sequential processing.
future.memory.per.core: The memory allocation per core for options(future.globals.maxSize = ...), and the calculation is future.globals.maxSize = max_core × future.memory.per.core × 1024 × 1024 bytes. Default is 2000 (unit in MB).
verbose: Print progress and diagnostic messages
...: Arguments passed to other methods

Value

return a Seurat object that contains a weighted.nn Neighbor object between the landmark cells and all the other cells. the Seurat object will also contain a landmark assay (and a training assay if sketch.training == TRUE).