Perform preprocessing procedures for a Seurat object to prepare for the sample-level analysis
Source:R/sample_level_object.R
PrepareSampleObject.RdThis is a wrapper function to apply landmark sketching, training data sketching (optional), PLS learning,
and weighted nearest neighbor (WNN) workflow to a single-cell Seurat object to get the necessary components for
sample-level analyses later. The output object can be directly used by GenerateSampleObject to obtain
a sample-level matrix.
Usage
PrepareSampleObject(
object = NULL,
assay = NULL,
add.hvg = TRUE,
group.by.CorTest = NULL,
Y = NULL,
sketch.training = TRUE,
group.by.Sketch = NULL,
ncells.per.group = 1000,
training.assay.name = "TRAINING",
training.sketch.method = "Uniform",
ncells.landmark = 2000,
landmark.assay.name = "LANDMARK",
landmark.sketch.method = "LeverageScore",
ncomp = 10,
pls.function = c("plsr", "spls", "cppls"),
pls.reduction.name = "pls",
k.nn = 5,
name.reduction.1 = "pca",
dims.reduction.1 = 1:30,
weighted.nn.name = "weighted.nn",
fix.wnn.weights = c(0.5, 0.5),
rm.training.assay = FALSE,
max_core = 1,
future.memory.per.core = 2000,
verbose = TRUE,
...
)Arguments
- object
a Seurat object.
- assay
the assay to perform the analyses on
- add.hvg
a boolen to specify whether the function runs additional correlation test between features and response (Y) to obtain more response-relevant features to be included in VariableFeatures and used for PLS learning. See
QuickCorTestfor details. Default is TRUE.- group.by.CorTest
A metadata column name to group cells by before QuickCorTest. If NULL, falls back to QuickCorTest without grouping. Default is NULL.
- Y
A metadata column name of responses. Will also be used as the responses for PLS learning.
- sketch.training
a boolen to specify whether the function performs sketching for the Seurat object to get a training subset for PLS learning (the PLS learned from subset will be projected to the full data). For efficiency, it is recommended to do so when the Seurat object is very large. If the 'assay' being specified for this Seurat object is an on-disk assay, this parameter must be set to TRUE. Default is FALSE.
- group.by.Sketch
A metadata column name to group cells by before sketching. If NULL, falls back to standard sketching without grouping. Default is NULL.
- ncells.per.group
A positive integer, named vector, or list specifying the number of cells to sample. Default is 1000 per category in 'group.by.Sketch'. See
SketchDataByGroupfor detailed usage.- training.assay.name
The assay name of the training data.
- training.sketch.method
Sketching method to use for the training subset. Can be 'LeverageScore' or 'Uniform'. Default is 'Uniform' because for training we want to obtain an unbiased subset of the original data. See
LeverageScorefor details.- ncells.landmark
A positive integer specifying the number of 'landmark cells'. The function will generate a new assay for these 'landmark cells' (please do not confuse it with the training subset). The landmark cells should be a subset that covers the diverse cell types and cell states of the data. Default is 2000.
- landmark.assay.name
The assay name of the landmark cells. Default is "LANDMARK".
- landmark.sketch.method
Sketching method to use for the landmark cells. Can be 'LeverageScore' or 'Uniform'. Default is 'LeverageScore' because we want the landmark cells to be as diverse/representative as possible. See
LeverageScorefor details.- ncomp
Number of components to compute
- pls.function
PLS function from pls package to run (options: plsr, spls, cppls)
- pls.reduction.name
PLS dimensional reduction name
- k.nn
The number of nearest neighbors to compute for each modality
- name.reduction.1
The name of the DimReduc to use as the 1st embedding (the 2nd is the PLS embedding) in the WNN process.
- dims.reduction.1
The dimensions for reduction.1 to use during the WNN process.
- weighted.nn.name
Multimodal neighbor object name
- fix.wnn.weights
Pre-specified modality weights. If provided, skips the calculation and uses these weights directly. Should be a list with the same length as reduction.list.
- rm.training.assay
Whether to remove the training assay after running PrepareSampleObject(). This is used to reduce the memory usage when the seurat object is large (e.g., a large on-disk object). Default is FALSE.
- max_core
The number of cores to use for parallelization when running the WNN process (but not other processes). Note that if the user has already set the "future::plan()" before running the function, it will ignore this parameter and respect user's future plan (and the plan() will be applied to other functions being called, including ScaleData()). Default is 1 for sequential processing.
- future.memory.per.core
The memory allocation per core for options(future.globals.maxSize = ...), and the calculation is future.globals.maxSize = max_core × future.memory.per.core × 1024 × 1024 bytes. Default is 2000 (unit in MB).
- verbose
Print progress and diagnostic messages
- ...
Arguments passed to other methods