The scVAE model
Encoder
The implementation is based on the Python implementation of the scvi-tools
encoder.
scVI.scEncoder
— Typemutable struct scEncoder
Julia implementation of the encoder of a single-cell VAE model corresponding to the scvi-tools
encoder. Collects all information on the encoder parameters and stores the basic encoder and mean and variance encoders. Can be constructed using keywords.
Fields for constructions
encoder
:Flux.Chain
of fully connected layers realising the first part of the encoder (before the split in mean and variance). For details, see the source code ofFC_layers
insrc/Utils
.mean_encoder
:Flux.Dense
fully connected layer realising the latent mean encodern_input
: input dimension = number of genes/featuresn_hidden
: number of hidden units to use in each hidden layern_output
: output dimension of the encoder = dimension of latent spacen_layers
: number of hidden layers in encoder and decodervar_activation
: whether or not to use an activation function for the variance layer in the encodervar_encoder
:Flux.Dense
fully connected layer realising the latent variance encodervar_eps
: numerical stability constant to add to the variance in the reparameterisation of the latent representationz_transformation
: whether to apply asoftmax
transformation the latent z if assuming a lognormal instead of a normal distribution
scVI.scEncoder
— MethodscEncoder(
n_input::Int,
n_output::Int;
activation_fn::Function=relu, # to use in FC_layers
bias::Bool=true,
n_hidden::Union{Int,Vector{Int}}=128,
n_layers::Int=1,
distribution::Symbol=:normal,
dropout_rate::Float32=0.1f0,
use_activation::Bool=true,
use_batch_norm::Bool=true,
use_layer_norm::Bool=false,
var_activation=nothing,
var_eps::Float32=Float32(1e-4)
)
Constructor for an scVAE
encoder. Initialises an scEncoder
object according to the input parameters. Julia implementation of the scvi-tools
encoder.
Arguments
n_input
: input dimension = number of genes/featuresn_output
: output dimension of the encoder = latent space dimension
Keyword arguments
activation_fn
: function to use as activation in all encoder neural network layersbias
: whether or not to use bias parameters in the encoder neural network layersn_hidden
: number of hidden units to use in each hidden layer (if anInt
is passed, this number is used in all hidden layers, alternatively an array ofInt
s can be passed, in which case the kth element corresponds to the number of units in the kth layer.n_layers
: number of hidden layers in encoderdistribution
:whether to use a:normal
or lognormal (:ln
) distribution for the latent zdropout_rate
: dropout to use in all encoder layers. Setting the rate to 0.0 corresponds to no dropout.use_activation
: whether or not to use an activation function in the encoder neural network layers; iffalse
, overrides choice inactication_fn
use_batch_norm
: whether or not to apply batch normalization in the encoder layersuse_layer_norm
: whether or not to apply layer normalization in the encoder layersvar_activation
: whether or not to use an activation function for the variance layer in the encodervar_eps
: numerical stability constant to add to the variance in the reparameterisation of the latent representation
Returns
scEncoder
object
Decoder
The implementation is based on the Python implementation of the scvi-tools
decoder.
There are several different distributions functions to be parameterized by the decoder, which the user can set by the gene_likelihood
argument. The following distributions are available:
:zinb
: Zero-inflated negative binomial distribution:nb
: Negative binomial distribution:poisson
: Poisson distribution:gaussian
: Gaussian distribution (for log-transformed data):bernoulli
: Bernoulli distribution (for binarized data)
Futher, there are different ways of calculating the dispersion parameter of the distribution, which can be set by the dispersion
argument. The following options are available:
:gene
: the dispersion parameter is calculated separately for each gene across all cells:gene_cell
: the dispersion parameter is calculated for each gene in each cell:gene_batch
: the dispersion parameter is calculated for each gene in each expeerimental batch
scVI.scDecoder
— Typemutable struct scDecoder <: AbstractDecoder
Julia implementation of the decoder for a single-cell VAE model corresponding to the scvi-tools
decoder. Collects all information on the decoder parameters and stores the decoder parts. Can be constructed using keywords.
Fields for construction
n_input
: input dimension = dimension of latent spacen_hidden
: number of hidden units to use in each hidden layer (if anInt
is passed, this number is used in all hidden layers,
alternatively an array of Int
s can be passed, in which case the kth element corresponds to the number of units in the kth layer.
n_output
: output dimension of the decoder = number of genes/featuresn_layers
: number of hidden layers in decoderpx_decoder
:Flux.Chain
of fully connected layers realising the first part of the decoder (before the split in mean, dispersion and dropout decoder). For details, see the source code ofFC_layers
insrc/Utils
.px_dropout_decoder
: if the generative distribution is zero-inflated negative binomial (gene_likelihood = :zinb
in thescVAE
model construction):Flux.Dense
layer, elsenothing
.px_r_decoder
: decoder for the dispersion parameter. If generative distribution is not some (zero-inflated) negative binomial, it isnothing
. Else, it is a parameter vector or aFlux.Dense
, depending on whether the dispersion is estimated per gene (dispersion = :gene
), or per gene and cell (dispersion = :gene_cell
)px_scale_decoder
: decoder for the mean of the reconstruction,Flux.Chain
of aDense
layer followed bysoftmax
activationuse_batch_norm
: whether or not to apply batch normalization in the decoder layersuse_layer_norm
: whether or not to apply layer normalization in the decoder layers
scVI.scDecoder
— MethodscDecoder(n_input, n_output;
activation_fn::Function=relu,
bias::Bool=true,
dispersion::Symbol=:gene,
dropout_rate::Float32=0.0f0,
gene_likelihood::Symbol=:zinb,
n_hidden::Union{Int,Vector{Int}}=128,
n_layers::Int=1,
use_activation::Bool=true,
use_batch_norm::Bool=true,
use_layer_norm::Bool=false
)
Constructor for an scVAE
decoder. Initialises an scDecoder
object according to the input parameters. Julia implementation of the scvi-tools
decoder.
Arguments
n_input
: input dimension of the decoder = latent space dimensionn_output
: output dimension = number of genes/features in the data
Keyword arguments
activation_fn
: function to use as activation in all decoder neural network layersbias
: whether or not to use bias parameters in the decoder neural network layersdispersion
: whether to estimate the dispersion parameter for the (zero-inflated) negative binomial generative distribution per gene (:gene
) or per gene and cell (:gene_cell
)dropout_rate
: dropout to use in all decoder layers. Setting the rate to 0.0 corresponds to no dropout.n_hidden
: number of hidden units to use in each hidden layer (if anInt
is passed, this number is used in all hidden layers, alternatively an array ofInt
s can be passed, in which case the kth element corresponds to the number of units in the kth layer.n_layers
: number of hidden layers in decoderuse_activation
: whether or not to use an activation function in the decoder neural network layers; iffalse
, overrides choice inactication_fn
use_batch_norm
: whether or not to apply batch normalization in the decoder layersuse_layer_norm
: whether or not to apply layer normalization in the decoder layers
Returns
scDecoder
object
Generative distribution functions
scVI.log_zinb_positive
— Functionlog_zinb_positive(x::AbstractMatrix{S}, mu::AbstractMatrix{S}, theta::AbstractVecOrMat{S}, zi::AbstractMatrix{S}, eps::S=S(1e-8)) where S <: Real
Log likelihood (scalar) of a minibatch according to a zero-inflated negative binomial generative model.
Arguments
x
: datamu
: mean of the negative binomial (has to be positive support) (shape: minibatch x vars)theta
: inverse dispersion parameter (has to be positive support) (shape: minibatch x vars)zi
: logit of the dropout parameter (real support) (shape: minibatch x vars)eps
: numerical stability constant
Notes
We parametrize the bernoulli using the logits, hence the softplus functions appearing.
scVI.log_nb_positive
— Functionlog_nb_positive(x::AbstractMatrix{S}, mu::AbstractMatrix{S}, theta::AbstractVecOrMat{S}, eps::S=S(1e-8)) where S <: Real
Log likelihood (scalar) of a minibatch according to a negative binomial generative model.
Arguments
x
: datamu
: mean of the negative binomial (has to be positive support) (shape: minibatch x vars)theta
: inverse dispersion parameter (has to be positive support) (shape: minibatch x vars)eps
: numerical stability constant
scVI.log_poisson
— Functionlog_poisson(x::AbstractMatrix{S}, mu::AbstractMatrix{S}, eps::S=S(1e-8)) where S <: Real
Log likelihood (scalar) of a minibatch according to a Poisson generative model.
Arguments
x
: datamu
: mean=variance of the Poisson distribution (has to be positive support) (shape: minibatch x vars)eps
: numerical stability constant
scVI.log_normal
— Functionlog_normal(x::AbstractMatrix{S}, μ::AbstractMatrix{S}, logσ::AbstractVecOrMat{S}) where S <: Real
Log likelihood (scalar) of a minibatch according to a Gaussian generative model.
Arguments
x
: dataμ
: mean of the Gaussian distribution (shape: minibatch x vars)logσ
: log standard deviation parameter (has to be positive support) (shape: minibatch x vars)
scVI.log_binary
— Functionlog_binary(x::AbstractMatrix{S}, dec_z::AbstractMatrix{S}) where S <: Real
Log likelihood (scalar) of a minibatch according to a Bernoulli generative model.
Arguments
x
: datadec_z
: decoder output - transformed to success probability of the Bernoulli distribution (shape: minibatch x vars)
VAE model
The implementation is a basic version of the scvi-tools
VAE object.
scVI.scVAE
— Typemutable struct scVAE
Julia implementation of the single-cell Variational Autoencoder model corresponding to the scvi-tools
VAE object. Collects all information on the model parameters such as distribution choices and stores the model encoder and decoder. Can be constructed using keywords.
Fields for construction
n_input::Ind
: input dimension = number of genes/featuresn_batch::Int=0
: number of batches in the datan_hidden::Int=128
: number of hidden units to use in each hidden layern_latent::Int=10
: dimension of latent spacen_layers::Int=1
: number of hidden layers in encoder and decoderdispersion::Symbol=:gene
: can be either:gene
or:gene-cell
. The Pythonscvi-tools
options:gene-batch
andgene-label
are planned, but not supported yet.is_trained::Bool=false
: indicating whether the model has been trained or notdropout_rate
: Dropout to use in the encoder and decoder layers. Setting the rate to 0.0 corresponds to no dropout.gene_likelihood::Symbol=:zinb
: which generative distribution to parameterize in the decoder. Can be one of:nb
(negative binomial),:zinb
(zero-inflated negative binomial), or:poisson
(Poisson).latent_distribution::Symbol=:normal
: whether or not to log-transform the input data in the encoder (for numerical stability)library_log_means::Union{Nothing, Vector{Float32}}
: log-transformed means of library size; has to be provided when not using observed library size, but encoding itlibrary_log_vars::Union{Nothing, Vector{Float32}}
: log-transformed variances of library size; has to be provided when not using observed library size, but encoding itlog_variational
: whether or not to log-transform the input data in the encoder (for numerical stability)loss_registry::Dict=Dict()
: dictionary in which to record the values of the different loss components (reconstruction error, KL divergence(s)) during traininguse_observed_lib_size::Bool=true
: whether or not to use the observed library size (iffalse
, library size is calculated by a dedicated encoder)z_encoder::scEncoder
: Encoder struct of the VAE model for latent representation; seescEncoder
l_encoder::Union{Nothing, scEncoder}
: Encoder struct of the VAE model for the library size (ifuse_observed_lib_size==false
), seescEncoder
decoder::AbstractDecoder
: Decoder struct of the VAE model; seescDecoder
scVI.scVAE
— MethodscVAE(n_input::Int;
activation_fn::Function=relu, # to be used in all FC_layers instances
bias::Symbol=:both, # whether to use bias in all linear layers of all FC instances
dispersion::Symbol=:gene,
dropout_rate::Float32=0.1f0,
gene_likelihood::Symbol=:zinb,
latent_distribution::Symbol=:normal,
library_log_means=nothing,
library_log_vars=nothing,
log_variational::Bool=true,
n_batch::Int=1,
n_hidden::Union{Int,Vector{Int}}=128,
n_latent::Int=10,
n_layers::Int=1,
use_activation::Symbol=:both,
use_batch_norm::Symbol=:both,
use_layer_norm::Symbol=:none,
use_observed_lib_size::Bool=true,
var_activation=nothing,
var_eps::Float32=Float32(1e-4),
seed::Int=1234
)
Constructor for the scVAE
model struct. Initialises an scVAE
model with the parameters specified in the input arguments. Basic Julia implementation of the scvi-tools
VAE object.
Arguments
n_input
: input dimension = number of genes/features
Keyword arguments
activation_fn
: function to use as activation in all neural network layers of encoder and decoderbias
: whether or not to use bias parameters in the neural network layers of encoder and decoderdispersion
: can be either:gene
or:gene-cell
. The Pythonscvi-tools
options:gene-batch
andgene-label
are planned, but not supported yet.dropout_rate
: Dropout to use in the encoder and decoder layers. Setting the rate to 0.0 corresponds to no dropout.gene_likelihood
: which generative distribution to parameterize in the decoder. Can be one of:nb
(negative binomial),:zinb
(zero-inflated negative binomial), or:poisson
(Poisson).library_log_means
: log-transformed means of library size; has to be provided when not using observed library size, but encoding itlibrary_log_vars
: log-transformed variances of library size; has to be provided when not using observed library size, but encoding itlog_variational
: whether or not to log-transform the input data in the encoder (for numerical stability)n_batch
: number of batches in the datan_hidden
: number of hidden units to use in each hidden layer (if anInt
is passed, this number is used in all hidden layers, alternatively an array ofInt
s can be passed, in which case the kth element corresponds to the number of units in the kth layer.n_latent
: dimension of latent spacen_layers
: number of hidden layers in encoder and decoderuse_activation
: whether or not to use an activation function in the neural network layers of encoder and decoder; iffalse
, overrides choice inactication_fn
use_batch_norm
: whether to apply batch normalization in the encoder/decoder layers; can be one of:encoder
,:decoder
,both
,:none
use_layer_norm
: whether to apply layer normalization in the encoder/decoder layers; can be one of:encoder
,:decoder
,both
,:none
use_observed_lib_size
: whether or not to use the observed library size (iffalse
, library size is calculated by a dedicated encoder)var_activation
: whether or not to use an activation function for the variance layer in the encodervar_eps
: numerical stability constant to add to the variance in the reparameterisation of the latent representationseed
: random seed to use for initialization of model parameters; to ensure reproducibility.
Returns
scVAE
object