The scVAE model

Encoder

The implementation is based on the Python implementation of the scvi-tools encoder.

scVI.scEncoder — Type

mutable struct scEncoder

Julia implementation of the encoder of a single-cell VAE model corresponding to the scvi-tools encoder. Collects all information on the encoder parameters and stores the basic encoder and mean and variance encoders. Can be constructed using keywords.

Fields for constructions

encoder: Flux.Chain of fully connected layers realising the first part of the encoder (before the split in mean and variance). For details, see the source code of FC_layers in src/Utils.
mean_encoder: Flux.Dense fully connected layer realising the latent mean encoder
n_input: input dimension = number of genes/features
n_hidden: number of hidden units to use in each hidden layer
n_output: output dimension of the encoder = dimension of latent space
n_layers: number of hidden layers in encoder and decoder
var_activation: whether or not to use an activation function for the variance layer in the encoder
var_encoder: Flux.Dense fully connected layer realising the latent variance encoder
var_eps: numerical stability constant to add to the variance in the reparameterisation of the latent representation
z_transformation: whether to apply a softmax transformation the latent z if assuming a lognormal instead of a normal distribution

source

scVI.scEncoder — Method

scEncoder(
    n_input::Int, 
    n_output::Int;
    activation_fn::Function=relu, # to use in FC_layers
    bias::Bool=true,
    n_hidden::Union{Int,Vector{Int}}=128,
    n_layers::Int=1,
    distribution::Symbol=:normal,
    dropout_rate::Float32=0.1f0,
    use_activation::Bool=true,
    use_batch_norm::Bool=true,
    use_layer_norm::Bool=false,
    var_activation=nothing,
    var_eps::Float32=Float32(1e-4)
)

Constructor for an scVAE encoder. Initialises an scEncoder object according to the input parameters. Julia implementation of the scvi-tools encoder.

Arguments

n_input: input dimension = number of genes/features
n_output: output dimension of the encoder = latent space dimension

Keyword arguments

activation_fn: function to use as activation in all encoder neural network layers
bias: whether or not to use bias parameters in the encoder neural network layers
n_hidden: number of hidden units to use in each hidden layer (if an Int is passed, this number is used in all hidden layers, alternatively an array of Ints can be passed, in which case the kth element corresponds to the number of units in the kth layer.
n_layers: number of hidden layers in encoder
distribution :whether to use a :normal or lognormal (:ln) distribution for the latent z
dropout_rate: dropout to use in all encoder layers. Setting the rate to 0.0 corresponds to no dropout.
use_activation: whether or not to use an activation function in the encoder neural network layers; if false, overrides choice in actication_fn
use_batch_norm: whether or not to apply batch normalization in the encoder layers
use_layer_norm: whether or not to apply layer normalization in the encoder layers
var_activation: whether or not to use an activation function for the variance layer in the encoder
var_eps: numerical stability constant to add to the variance in the reparameterisation of the latent representation

Returns

scEncoder object

source

Decoder

The implementation is based on the Python implementation of the scvi-tools decoder.

There are several different distributions functions to be parameterized by the decoder, which the user can set by the gene_likelihood argument. The following distributions are available:

:zinb: Zero-inflated negative binomial distribution
:nb: Negative binomial distribution
:poisson: Poisson distribution
:gaussian: Gaussian distribution (for log-transformed data)
:bernoulli: Bernoulli distribution (for binarized data)

Futher, there are different ways of calculating the dispersion parameter of the distribution, which can be set by the dispersion argument. The following options are available:

:gene: the dispersion parameter is calculated separately for each gene across all cells
:gene_cell: the dispersion parameter is calculated for each gene in each cell
:gene_batch: the dispersion parameter is calculated for each gene in each expeerimental batch

scVI.scDecoder — Type

mutable struct scDecoder <: AbstractDecoder

Julia implementation of the decoder for a single-cell VAE model corresponding to the scvi-tools decoder. Collects all information on the decoder parameters and stores the decoder parts. Can be constructed using keywords.

Fields for construction

n_input: input dimension = dimension of latent space
n_hidden: number of hidden units to use in each hidden layer (if an Int is passed, this number is used in all hidden layers,

alternatively an array of Ints can be passed, in which case the kth element corresponds to the number of units in the kth layer.

n_output: output dimension of the decoder = number of genes/features
n_layers: number of hidden layers in decoder
px_decoder: Flux.Chain of fully connected layers realising the first part of the decoder (before the split in mean, dispersion and dropout decoder). For details, see the source code of FC_layers in src/Utils.
px_dropout_decoder: if the generative distribution is zero-inflated negative binomial (gene_likelihood = :zinb in the scVAE model construction): Flux.Dense layer, else nothing.
px_r_decoder: decoder for the dispersion parameter. If generative distribution is not some (zero-inflated) negative binomial, it is nothing. Else, it is a parameter vector or a Flux.Dense, depending on whether the dispersion is estimated per gene (dispersion = :gene), or per gene and cell (dispersion = :gene_cell)
px_scale_decoder: decoder for the mean of the reconstruction, Flux.Chain of a Dense layer followed by softmax activation
use_batch_norm: whether or not to apply batch normalization in the decoder layers
use_layer_norm: whether or not to apply layer normalization in the decoder layers

source

scVI.scDecoder — Method

scDecoder(n_input, n_output; 
    activation_fn::Function=relu,
    bias::Bool=true,
    dispersion::Symbol=:gene,
    dropout_rate::Float32=0.0f0,
    gene_likelihood::Symbol=:zinb,
    n_hidden::Union{Int,Vector{Int}}=128,
    n_layers::Int=1, 
    use_activation::Bool=true,
    use_batch_norm::Bool=true,
    use_layer_norm::Bool=false
)

Constructor for an scVAE decoder. Initialises an scDecoder object according to the input parameters. Julia implementation of the scvi-tools decoder.

Arguments

n_input: input dimension of the decoder = latent space dimension
n_output: output dimension = number of genes/features in the data

Keyword arguments

activation_fn: function to use as activation in all decoder neural network layers
bias: whether or not to use bias parameters in the decoder neural network layers
dispersion: whether to estimate the dispersion parameter for the (zero-inflated) negative binomial generative distribution per gene (:gene) or per gene and cell (:gene_cell)
dropout_rate: dropout to use in all decoder layers. Setting the rate to 0.0 corresponds to no dropout.
n_hidden: number of hidden units to use in each hidden layer (if an Int is passed, this number is used in all hidden layers, alternatively an array of Ints can be passed, in which case the kth element corresponds to the number of units in the kth layer.
n_layers: number of hidden layers in decoder
use_activation: whether or not to use an activation function in the decoder neural network layers; if false, overrides choice in actication_fn
use_batch_norm: whether or not to apply batch normalization in the decoder layers
use_layer_norm: whether or not to apply layer normalization in the decoder layers

Returns

scDecoder object

source

Generative distribution functions

scVI.log_zinb_positive — Function

log_zinb_positive(x::AbstractMatrix{S}, mu::AbstractMatrix{S}, theta::AbstractVecOrMat{S}, zi::AbstractMatrix{S}, eps::S=S(1e-8)) where S <: Real

Log likelihood (scalar) of a minibatch according to a zero-inflated negative binomial generative model.

Arguments

x: data
mu: mean of the negative binomial (has to be positive support) (shape: minibatch x vars)
theta: inverse dispersion parameter (has to be positive support) (shape: minibatch x vars)
zi: logit of the dropout parameter (real support) (shape: minibatch x vars)
eps: numerical stability constant

Notes

We parametrize the bernoulli using the logits, hence the softplus functions appearing.

source

scVI.log_nb_positive — Function

log_nb_positive(x::AbstractMatrix{S}, mu::AbstractMatrix{S}, theta::AbstractVecOrMat{S}, eps::S=S(1e-8)) where S <: Real

Log likelihood (scalar) of a minibatch according to a negative binomial generative model.

Arguments

x: data
mu: mean of the negative binomial (has to be positive support) (shape: minibatch x vars)
theta: inverse dispersion parameter (has to be positive support) (shape: minibatch x vars)
eps: numerical stability constant

source

scVI.log_poisson — Function

log_poisson(x::AbstractMatrix{S}, mu::AbstractMatrix{S}, eps::S=S(1e-8)) where S <: Real

Log likelihood (scalar) of a minibatch according to a Poisson generative model.

Arguments

x: data
mu: mean=variance of the Poisson distribution (has to be positive support) (shape: minibatch x vars)
eps: numerical stability constant

source

scVI.log_normal — Function

log_normal(x::AbstractMatrix{S}, μ::AbstractMatrix{S}, logσ::AbstractVecOrMat{S}) where S <: Real

Log likelihood (scalar) of a minibatch according to a Gaussian generative model.

Arguments

x: data
μ: mean of the Gaussian distribution (shape: minibatch x vars)
logσ: log standard deviation parameter (has to be positive support) (shape: minibatch x vars)

source

scVI.log_binary — Function

log_binary(x::AbstractMatrix{S}, dec_z::AbstractMatrix{S}) where S <: Real

Log likelihood (scalar) of a minibatch according to a Bernoulli generative model.

Arguments

x: data
dec_z: decoder output - transformed to success probability of the Bernoulli distribution (shape: minibatch x vars)

source

VAE model

The implementation is a basic version of the scvi-tools VAE object.

scVI.scVAE — Type

mutable struct scVAE

Julia implementation of the single-cell Variational Autoencoder model corresponding to the scvi-tools VAE object. Collects all information on the model parameters such as distribution choices and stores the model encoder and decoder. Can be constructed using keywords.

Fields for construction

n_input::Ind: input dimension = number of genes/features
n_batch::Int=0: number of batches in the data
n_hidden::Int=128: number of hidden units to use in each hidden layer
n_latent::Int=10: dimension of latent space
n_layers::Int=1: number of hidden layers in encoder and decoder
dispersion::Symbol=:gene: can be either :gene or :gene-cell. The Python scvi-tools options :gene-batch and gene-label are planned, but not supported yet.
is_trained::Bool=false: indicating whether the model has been trained or not
dropout_rate: Dropout to use in the encoder and decoder layers. Setting the rate to 0.0 corresponds to no dropout.
gene_likelihood::Symbol=:zinb: which generative distribution to parameterize in the decoder. Can be one of :nb (negative binomial), :zinb (zero-inflated negative binomial), or :poisson (Poisson).
latent_distribution::Symbol=:normal: whether or not to log-transform the input data in the encoder (for numerical stability)
library_log_means::Union{Nothing, Vector{Float32}}: log-transformed means of library size; has to be provided when not using observed library size, but encoding it
library_log_vars::Union{Nothing, Vector{Float32}}: log-transformed variances of library size; has to be provided when not using observed library size, but encoding it
log_variational: whether or not to log-transform the input data in the encoder (for numerical stability)
loss_registry::Dict=Dict(): dictionary in which to record the values of the different loss components (reconstruction error, KL divergence(s)) during training
use_observed_lib_size::Bool=true: whether or not to use the observed library size (if false, library size is calculated by a dedicated encoder)
z_encoder::scEncoder: Encoder struct of the VAE model for latent representation; see scEncoder
l_encoder::Union{Nothing, scEncoder}: Encoder struct of the VAE model for the library size (if use_observed_lib_size==false), see scEncoder
decoder::AbstractDecoder: Decoder struct of the VAE model; see scDecoder

source

scVI.scVAE — Method

scVAE(n_input::Int;
    activation_fn::Function=relu, # to be used in all FC_layers instances
    bias::Symbol=:both, # whether to use bias in all linear layers of all FC instances 
    dispersion::Symbol=:gene,
    dropout_rate::Float32=0.1f0,
    gene_likelihood::Symbol=:zinb,
    latent_distribution::Symbol=:normal,
    library_log_means=nothing,
    library_log_vars=nothing,
    log_variational::Bool=true,
    n_batch::Int=1,
    n_hidden::Union{Int,Vector{Int}}=128,
    n_latent::Int=10,
    n_layers::Int=1,
    use_activation::Symbol=:both, 
    use_batch_norm::Symbol=:both,
    use_layer_norm::Symbol=:none,
    use_observed_lib_size::Bool=true,
    var_activation=nothing,
    var_eps::Float32=Float32(1e-4),
    seed::Int=1234
)

Constructor for the scVAE model struct. Initialises an scVAE model with the parameters specified in the input arguments. Basic Julia implementation of the scvi-tools VAE object.

Arguments

n_input: input dimension = number of genes/features

Keyword arguments

activation_fn: function to use as activation in all neural network layers of encoder and decoder
bias: whether or not to use bias parameters in the neural network layers of encoder and decoder
dispersion: can be either :gene or :gene-cell. The Python scvi-tools options :gene-batch and gene-label are planned, but not supported yet.
dropout_rate: Dropout to use in the encoder and decoder layers. Setting the rate to 0.0 corresponds to no dropout.
gene_likelihood: which generative distribution to parameterize in the decoder. Can be one of :nb (negative binomial), :zinb (zero-inflated negative binomial), or :poisson (Poisson).
library_log_means: log-transformed means of library size; has to be provided when not using observed library size, but encoding it
library_log_vars: log-transformed variances of library size; has to be provided when not using observed library size, but encoding it
log_variational: whether or not to log-transform the input data in the encoder (for numerical stability)
n_batch: number of batches in the data
n_hidden: number of hidden units to use in each hidden layer (if an Int is passed, this number is used in all hidden layers, alternatively an array of Ints can be passed, in which case the kth element corresponds to the number of units in the kth layer.
n_latent: dimension of latent space
n_layers: number of hidden layers in encoder and decoder
use_activation: whether or not to use an activation function in the neural network layers of encoder and decoder; if false, overrides choice in actication_fn
use_batch_norm: whether to apply batch normalization in the encoder/decoder layers; can be one of :encoder, :decoder, both, :none
use_layer_norm: whether to apply layer normalization in the encoder/decoder layers; can be one of :encoder, :decoder, both, :none
use_observed_lib_size: whether or not to use the observed library size (if false, library size is calculated by a dedicated encoder)
var_activation: whether or not to use an activation function for the variance layer in the encoder
var_eps: numerical stability constant to add to the variance in the reparameterisation of the latent representation
seed: random seed to use for initialization of model parameters; to ensure reproducibility.

Returns

scVAE object

source