Model evaluation

Extract latent representations

scVI.get_latent_representationFunction
get_latent_representation(m::scVAE, countmatrix::Matrix; 
    cellindices=nothing, give_mean::Bool=true
)

Computes the latent representation of an scVAE model on input count data by applying the scVAE encoder.

Returns the mean (default) or a sample of the latent representation (can be controlled by give_mean keyword argument).

Arguments

  • m::scVAE: scVAE model from which the encoder is applied to get the latent representation
  • countmatrix::Matrix: matrix of counts (e.g., countmatrix field of an AnnData object), which is to be embedded with the scVAE model encoder. Is assumed to be in a (cell x gene) format.

Keyword arguments

  • cellindices=nothing: optional; indices of cells (=rows) on which to subset the countmatrix before embedding it
  • give_mean::Bool=true: optional; if true, returns the mean of the latent representation, else returns a sample.

Returns

  • z: latent representation of the countmatrix data, either the mean or a sample (controlled by give_mean keyword argument)
source
scVI.register_latent_representation!Function
register_latent_representation!(adata::AnnData, m::scVAE; name_latent::String="scVI_latent")

Calculates the latent representation obtained from encoding the countmatrix of the AnnData object with a trained scVAE model by applying the function get_latent_representation(m, adata.X). Stored the latent representation in the obsm field of the input AnnData object as name_latent.

Arguments

  • adata::AnnData: AnnData object to which to add the latent representation
  • m::scVAE: trained scVAE model to use for encoding the data

Keyword arguments

  • name_latent::String="scVI_latent": name of the field in adata.obsm where the latent representation is stored

Returns

  • the modified AnnData object.
source
scVI.get_loadingsFunction
get_loadings(dec::scLinearDecoder)

Extracts the loadings of a scLinearDecoder, specifically corresponding to the weight matrix of the linear scLinearDecoder.factor_regressor layer. If batch normalisation is applied, the weight matrix is re-scaled according to the accumulated statistics in the batch norm layer (for details, see ?Flux.BatchNorm).

Arguments

  • dec: scLinearDecoder object

Returns

  • the matrix of loadings
source

Dimension reduction and plotting

scVI.register_umap_on_latent!Function
register_umap_on_latent!(adata::AnnData, m::scVAE; name_latent::String="scVI_latent", name_umap::String="scVI_latent_umap")

Calculates a UMAP (Uniform Manifold Projection and Embedding, McInnes et al. 2018) embedding of the latent representation obtained from encoding the countmatrix of the AnnData object with a trained scVAE model. If a latent representation is already stored in adata.name_latent, this is used for calculating the UMAP, if not, a latent representation is calculated and registered by calling register_latent_representation!(adata, m).

The UMAP is calculated using the Julia package UMAP.jl with default parameters. It is then stored in the name_umap field of the input AnnData object.

Arguments

  • adata::AnnData: AnnData object to which to add the UMAP representation
  • m::scVAE: trained scVAE model to use for encoding the data

Keyword arguments

  • name_latent::String="scVI_latent": name of the field in adata.obsm where the latent representation is stored
  • name_umap::String="scVI_latent_umap": name of the field in adata.obsm where the UMAP representation is stored

Returns

  • the modified AnnData object.
source
scVI.plot_umap_on_latentFunction
function plot_umap_on_latent(
    m::scVAE, adata::AnnData; 
    name_latent::String="scVI_latent",
    name_latent_umap::String="scVI_latent_umap",    
    save_plot::Bool=false, 
    seed::Int=987, 
    filename::String="UMAP_on_latent.pdf"
)

Plots a UMAP embedding of the latent representation obtained from encoding the countmatrix of the AnnData object with the scVAE model. If no UMAP representation is stored in adata.scVI_latent_umap, it is calculated and registered by calling register_umap_on_latent(adata, m).

By default, the cells are color-coded according to the celltypes field of the AnnData object.

For plotting, the VegaLite.jl package is used.

Arguments

  • m::scVAE: trained scVAE model to use for embedding the data with the model encoder
  • adata:AnnData: data to embed with the model; adata.X is encoded with m

Keyword arguments

  • name_latent::String="scVI_latent": name of the field in adata.obsm where the latent representation is stored
  • name_latent_umap::String="scVI_latent_umap": name of the field in adata.obsm where the UMAP representation is stored
  • save_plot::Bool=true: whether or not to save the plot
  • filename::String="UMAP_on_latent.pdf: filename under which to save the plot. Has no effect if save_plot==false.
  • seed::Int=987: which random seed to use for calculating UMAP (to ensure reproducibility)

Returns

  • the plot
source
scVI.plot_pca_on_latentFunction
plot_pca_on_latent(
    m::scVAE, adata::AnnData; 
    save_plot::Bool=false, 
    filename::String="PCA_on_latent.pdf"
)

Plots a PCA embedding of the latent representation obtained from encoding the countmatrix of the AnnData object with the scVAE model. If no latent representation is stored in adata.scVI_latent, it is calculated and registered by calling register_latent_representation(adata, m).

PCA is calculated using the singular value decomposition implementation in LinearAlgebra.jl, see ?LinearAlgebra.svd. For details on the PCA implementation, see the source code in the prcomps function in src/Evaluate.jl.

By default, the cells are color-coded according to the celltypes field of the AnnData object.

For plotting, the VegaLite.jl package is used.

Arguments

  • m::scVAE: trained scVAE model to use for embedding the data with the model encoder
  • adata:AnnData: data to embed with the model; adata.X is encoded with m

Keyword arguments

  • save_plot::Bool=true: whether or not to save the plot
  • filename::String="UMAP_on_latent.pdf: filename under which to save the plot. Has no effect if save_plot==false.

Returns

  • the plot
source
scVI.plot_latent_representationFunction
function plot_latent_representation(
    m::scVAE, adata::AnnData; 
    name_latent::String="scVI_latent",
    plot_title::String="scVI latent representation",
    save_plot::Bool=false, 
    seed::Int=987, 
    filename::String="UMAP_on_latent.pdf"
)

Plots the latent representation obtained from encoding the countmatrix of the AnnData object with the scVAE model. If the dimension of the latent space according to m.n_latent is > 2, it calculates a UMAP embedding first. In this case, if no UMAP representation is stored in adata.scVI_latent_umap, it is calculated and registered by calling register_umap_on_latent(adata, m).

By default, the cells are color-coded according to the celltypes column in adata.obs, if present.

For plotting, the VegaLite.jl package is used.

Arguments

  • m::scVAE: trained scVAE model to use for embedding the data with the model encoder
  • adata:AnnData: data to embed with the model; adata.X is encoded with m

Keyword arguments

  • name_latent::String="scVI_latent": name of the field in adata.obsm where the latent representation is stored
  • plot_title::String="scVI latent representation": title of the plot
  • save_plot::Bool=true: whether or not to save the plot
  • filename::String="UMAP_on_latent.pdf: filename under which to save the plot. Has no effect if save_plot==false.
  • seed::Int=987: which random seed to use for calculating UMAP (to ensure reproducibility)

Returns

  • the plot
source

Sampling from the trained model

scVI.sample_from_priorFunction
sample_from_prior(m::scVAE, adata::AnnData, n_samples::Int; sample_library_size::Bool=false)

Samples from the prior N(0,1) distribution of the latent representation of a trained scVAE model. Calculates the library size based on the countmatrix of the input AnnData object and either samples from it or uses the mean. Subsequently draws n_samples from the generative distribution defined by the decoder based on the samples from the prior and the library size.

Returns the samples from the model.

Arguments

  • m::scVAE: trained scVAE model from which to sample
  • adata::AnnData: AnnData object based on which to calculate the library size
  • n_samples::Int: number of samples to draw

Keyword arguments

  • sample_library_size::Bool=false: whether or not to sample from the library size. If false, the mean of the observed library size is used.

Returns

  • matrix of prior samples from the model
source
scVI.sample_from_posteriorFunction
sample_from_posterior(m::scVAE, adata::AnnData)

Samples from the posterior distribution of the latent representation of a trained scVAE model. Calculates the latent posterior mean and variance and the library size based on the countmatrix of the input AnnData object and samples from the posterior. Subsequently samples from the generative distribution defined by the decoder based on the samples of the latent representation and the library size.

Returns the samples from the model.

Arguments

  • m::scVAE: trained scVAE model from which to sample
  • adata::AnnData: AnnData object based on which to calculate the latent posterior

Returns

  • matrix of posterior samples from the model
source

Both prior and posterior sampling are based on the following more low-level function, which is not exported but can be called as scVI.decodersample:

scVI.decodersampleFunction
decodersample(m::scVAE, z::AbstractMatrix{S}, library::AbstractMatrix{S}) where S <: Real

Samples from the generative distribution defined by the decoder of the scVAE model based on values of the latent variable z. Depending on whether z is sampled from the prior or posterior, the function can be used to realise both prior and posterior sampling, see sample_from_posterior() and sample_from_prior for details.

The distribution ((zero-inflated) negative binomial or Poisson) is parametrised by mu, theta and zi (logits of dropout parameter). The implementation is adapted from the corresponding scvi tools function

Arguments

  • m::scVAE: scVAE model from which the decoder is used for sampling
  • z::AbstractMatrix: values of the latent representation to use as input for the decoder
  • library::AbstractMatrix: library size values that are used for scaling in the decoder (either corresponding to the observed or the model-encoded library size)

Returns

  • matrix of samples from the generative distribution defined by the decoder of the scVAE model
source