The scVAE model
Encoder
The implementation is based on the Python implementation of the  scvi-tools encoder.
scVI.scEncoder — Typemutable struct scEncoderJulia implementation of the encoder of a single-cell VAE model corresponding to the scvi-tools encoder. Collects all information on the encoder parameters and stores the basic encoder and mean and variance encoders.  Can be constructed using keywords. 
Fields for constructions
encoder:Flux.Chainof fully connected layers realising the first part of the encoder (before the split in mean and variance). For details, see the source code ofFC_layersinsrc/Utils.mean_encoder:Flux.Densefully connected layer realising the latent mean encodern_input: input dimension = number of genes/featuresn_hidden: number of hidden units to use in each hidden layern_output: output dimension of the encoder = dimension of latent spacen_layers: number of hidden layers in encoder and decodervar_activation: whether or not to use an activation function for the variance layer in the encodervar_encoder:Flux.Densefully connected layer realising the latent variance encodervar_eps: numerical stability constant to add to the variance in the reparameterisation of the latent representationz_transformation: whether to apply asoftmaxtransformation the latent z if assuming a lognormal instead of a normal distribution
scVI.scEncoder — MethodscEncoder(
    n_input::Int, 
    n_output::Int;
    activation_fn::Function=relu, # to use in FC_layers
    bias::Bool=true,
    n_hidden::Union{Int,Vector{Int}}=128,
    n_layers::Int=1,
    distribution::Symbol=:normal,
    dropout_rate::Float32=0.1f0,
    use_activation::Bool=true,
    use_batch_norm::Bool=true,
    use_layer_norm::Bool=false,
    var_activation=nothing,
    var_eps::Float32=Float32(1e-4)
)Constructor for an scVAE encoder. Initialises an scEncoder object according to the input parameters.  Julia implementation of the scvi-tools encoder.
Arguments
n_input: input dimension = number of genes/featuresn_output: output dimension of the encoder = latent space dimension
Keyword arguments
activation_fn: function to use as activation in all encoder neural network layersbias: whether or not to use bias parameters in the encoder neural network layersn_hidden: number of hidden units to use in each hidden layer (if anIntis passed, this number is used in all hidden layers, alternatively an array ofInts can be passed, in which case the kth element corresponds to the number of units in the kth layer.n_layers: number of hidden layers in encoderdistribution:whether to use a:normalor lognormal (:ln) distribution for the latent zdropout_rate: dropout to use in all encoder layers. Setting the rate to 0.0 corresponds to no dropout.use_activation: whether or not to use an activation function in the encoder neural network layers; iffalse, overrides choice inactication_fnuse_batch_norm: whether or not to apply batch normalization in the encoder layersuse_layer_norm: whether or not to apply layer normalization in the encoder layersvar_activation: whether or not to use an activation function for the variance layer in the encodervar_eps: numerical stability constant to add to the variance in the reparameterisation of the latent representation
Returns
scEncoderobject
Decoder
The implementation is based on the Python implementation of the  scvi-tools decoder.
There are several different distributions functions to be parameterized by the decoder, which the user can set by the gene_likelihood argument. The following distributions are available:
:zinb: Zero-inflated negative binomial distribution:nb: Negative binomial distribution:poisson: Poisson distribution:gaussian: Gaussian distribution (for log-transformed data):bernoulli: Bernoulli distribution (for binarized data)
Futher, there are different ways of calculating the dispersion parameter of the distribution, which can be set by the dispersion argument. The following options are available:
:gene: the dispersion parameter is calculated separately for each gene across all cells:gene_cell: the dispersion parameter is calculated for each gene in each cell:gene_batch: the dispersion parameter is calculated for each gene in each expeerimental batch
scVI.scDecoder — Typemutable struct scDecoder <: AbstractDecoderJulia implementation of the decoder for a single-cell VAE model corresponding to the scvi-tools decoder. Collects all information on the decoder parameters and stores the decoder parts.  Can be constructed using keywords. 
Fields for construction
n_input: input dimension = dimension of latent spacen_hidden: number of hidden units to use in each hidden layer (if anIntis passed, this number is used in all hidden layers,
alternatively an array of Ints can be passed, in which case the kth element corresponds to the number of units in the kth layer.
n_output: output dimension of the decoder = number of genes/featuresn_layers: number of hidden layers in decoderpx_decoder:Flux.Chainof fully connected layers realising the first part of the decoder (before the split in mean, dispersion and dropout decoder). For details, see the source code ofFC_layersinsrc/Utils.px_dropout_decoder: if the generative distribution is zero-inflated negative binomial (gene_likelihood = :zinbin thescVAEmodel construction):Flux.Denselayer, elsenothing.px_r_decoder: decoder for the dispersion parameter. If generative distribution is not some (zero-inflated) negative binomial, it isnothing. Else, it is a parameter vector or aFlux.Dense, depending on whether the dispersion is estimated per gene (dispersion = :gene), or per gene and cell (dispersion = :gene_cell)px_scale_decoder: decoder for the mean of the reconstruction,Flux.Chainof aDenselayer followed bysoftmaxactivationuse_batch_norm: whether or not to apply batch normalization in the decoder layersuse_layer_norm: whether or not to apply layer normalization in the decoder layers
scVI.scDecoder — MethodscDecoder(n_input, n_output; 
    activation_fn::Function=relu,
    bias::Bool=true,
    dispersion::Symbol=:gene,
    dropout_rate::Float32=0.0f0,
    gene_likelihood::Symbol=:zinb,
    n_hidden::Union{Int,Vector{Int}}=128,
    n_layers::Int=1, 
    use_activation::Bool=true,
    use_batch_norm::Bool=true,
    use_layer_norm::Bool=false
)Constructor for an scVAE decoder. Initialises an scDecoder object according to the input parameters.  Julia implementation of the scvi-tools decoder.
Arguments
n_input: input dimension of the decoder = latent space dimensionn_output: output dimension = number of genes/features in the data
Keyword arguments
activation_fn: function to use as activation in all decoder neural network layersbias: whether or not to use bias parameters in the decoder neural network layersdispersion: whether to estimate the dispersion parameter for the (zero-inflated) negative binomial generative distribution per gene (:gene) or per gene and cell (:gene_cell)dropout_rate: dropout to use in all decoder layers. Setting the rate to 0.0 corresponds to no dropout.n_hidden: number of hidden units to use in each hidden layer (if anIntis passed, this number is used in all hidden layers, alternatively an array ofInts can be passed, in which case the kth element corresponds to the number of units in the kth layer.n_layers: number of hidden layers in decoderuse_activation: whether or not to use an activation function in the decoder neural network layers; iffalse, overrides choice inactication_fnuse_batch_norm: whether or not to apply batch normalization in the decoder layersuse_layer_norm: whether or not to apply layer normalization in the decoder layers
Returns
scDecoderobject
Generative distribution functions
scVI.log_zinb_positive — Functionlog_zinb_positive(x::AbstractMatrix{S}, mu::AbstractMatrix{S}, theta::AbstractVecOrMat{S}, zi::AbstractMatrix{S}, eps::S=S(1e-8)) where S <: RealLog likelihood (scalar) of a minibatch according to a zero-inflated negative binomial generative model.
Arguments
x: datamu: mean of the negative binomial (has to be positive support) (shape: minibatch x vars)theta: inverse dispersion parameter (has to be positive support) (shape: minibatch x vars)zi: logit of the dropout parameter (real support) (shape: minibatch x vars)eps: numerical stability constant
Notes
We parametrize the bernoulli using the logits, hence the softplus functions appearing.
scVI.log_nb_positive — Functionlog_nb_positive(x::AbstractMatrix{S}, mu::AbstractMatrix{S}, theta::AbstractVecOrMat{S}, eps::S=S(1e-8)) where S <: RealLog likelihood (scalar) of a minibatch according to a negative binomial generative model.
Arguments
x: datamu: mean of the negative binomial (has to be positive support) (shape: minibatch x vars)theta: inverse dispersion parameter (has to be positive support) (shape: minibatch x vars)eps: numerical stability constant
scVI.log_poisson — Functionlog_poisson(x::AbstractMatrix{S}, mu::AbstractMatrix{S}, eps::S=S(1e-8)) where S <: RealLog likelihood (scalar) of a minibatch according to a Poisson generative model.
Arguments
x: datamu: mean=variance of the Poisson distribution (has to be positive support) (shape: minibatch x vars)eps: numerical stability constant
scVI.log_normal — Functionlog_normal(x::AbstractMatrix{S}, μ::AbstractMatrix{S}, logσ::AbstractVecOrMat{S}) where S <: RealLog likelihood (scalar) of a minibatch according to a Gaussian generative model.
Arguments
x: dataμ: mean of the Gaussian distribution (shape: minibatch x vars)logσ: log standard deviation parameter (has to be positive support) (shape: minibatch x vars)
scVI.log_binary — Functionlog_binary(x::AbstractMatrix{S}, dec_z::AbstractMatrix{S}) where S <: RealLog likelihood (scalar) of a minibatch according to a Bernoulli generative model.
Arguments
x: datadec_z: decoder output - transformed to success probability of the Bernoulli distribution (shape: minibatch x vars)
VAE model
The implementation is a basic version of the scvi-tools VAE object. 
scVI.scVAE — Typemutable struct scVAEJulia implementation of the single-cell Variational Autoencoder model corresponding to the scvi-tools VAE object.  Collects all information on the model parameters such as distribution choices and stores the model encoder and decoder.  Can be constructed using keywords. 
Fields for construction
n_input::Ind: input dimension = number of genes/featuresn_batch::Int=0: number of batches in the datan_hidden::Int=128: number of hidden units to use in each hidden layern_latent::Int=10: dimension of latent spacen_layers::Int=1: number of hidden layers in encoder and decoderdispersion::Symbol=:gene: can be either:geneor:gene-cell. The Pythonscvi-toolsoptions:gene-batchandgene-labelare planned, but not supported yet.is_trained::Bool=false: indicating whether the model has been trained or notdropout_rate: Dropout to use in the encoder and decoder layers. Setting the rate to 0.0 corresponds to no dropout.gene_likelihood::Symbol=:zinb: which generative distribution to parameterize in the decoder. Can be one of:nb(negative binomial),:zinb(zero-inflated negative binomial), or:poisson(Poisson).latent_distribution::Symbol=:normal: whether or not to log-transform the input data in the encoder (for numerical stability)library_log_means::Union{Nothing, Vector{Float32}}: log-transformed means of library size; has to be provided when not using observed library size, but encoding itlibrary_log_vars::Union{Nothing, Vector{Float32}}: log-transformed variances of library size; has to be provided when not using observed library size, but encoding itlog_variational: whether or not to log-transform the input data in the encoder (for numerical stability)loss_registry::Dict=Dict(): dictionary in which to record the values of the different loss components (reconstruction error, KL divergence(s)) during traininguse_observed_lib_size::Bool=true: whether or not to use the observed library size (iffalse, library size is calculated by a dedicated encoder)z_encoder::scEncoder: Encoder struct of the VAE model for latent representation; seescEncoderl_encoder::Union{Nothing, scEncoder}: Encoder struct of the VAE model for the library size (ifuse_observed_lib_size==false), seescEncoderdecoder::AbstractDecoder: Decoder struct of the VAE model; seescDecoder
scVI.scVAE — MethodscVAE(n_input::Int;
    activation_fn::Function=relu, # to be used in all FC_layers instances
    bias::Symbol=:both, # whether to use bias in all linear layers of all FC instances 
    dispersion::Symbol=:gene,
    dropout_rate::Float32=0.1f0,
    gene_likelihood::Symbol=:zinb,
    latent_distribution::Symbol=:normal,
    library_log_means=nothing,
    library_log_vars=nothing,
    log_variational::Bool=true,
    n_batch::Int=1,
    n_hidden::Union{Int,Vector{Int}}=128,
    n_latent::Int=10,
    n_layers::Int=1,
    use_activation::Symbol=:both, 
    use_batch_norm::Symbol=:both,
    use_layer_norm::Symbol=:none,
    use_observed_lib_size::Bool=true,
    var_activation=nothing,
    var_eps::Float32=Float32(1e-4),
    seed::Int=1234
)Constructor for the scVAE model struct. Initialises an scVAE model with the parameters specified in the input arguments.  Basic Julia implementation of the scvi-tools VAE object. 
Arguments
n_input: input dimension = number of genes/features
Keyword arguments
activation_fn: function to use as activation in all neural network layers of encoder and decoderbias: whether or not to use bias parameters in the neural network layers of encoder and decoderdispersion: can be either:geneor:gene-cell. The Pythonscvi-toolsoptions:gene-batchandgene-labelare planned, but not supported yet.dropout_rate: Dropout to use in the encoder and decoder layers. Setting the rate to 0.0 corresponds to no dropout.gene_likelihood: which generative distribution to parameterize in the decoder. Can be one of:nb(negative binomial),:zinb(zero-inflated negative binomial), or:poisson(Poisson).library_log_means: log-transformed means of library size; has to be provided when not using observed library size, but encoding itlibrary_log_vars: log-transformed variances of library size; has to be provided when not using observed library size, but encoding itlog_variational: whether or not to log-transform the input data in the encoder (for numerical stability)n_batch: number of batches in the datan_hidden: number of hidden units to use in each hidden layer (if anIntis passed, this number is used in all hidden layers, alternatively an array ofInts can be passed, in which case the kth element corresponds to the number of units in the kth layer.n_latent: dimension of latent spacen_layers: number of hidden layers in encoder and decoderuse_activation: whether or not to use an activation function in the neural network layers of encoder and decoder; iffalse, overrides choice inactication_fnuse_batch_norm: whether to apply batch normalization in the encoder/decoder layers; can be one of:encoder,:decoder,both,:noneuse_layer_norm: whether to apply layer normalization in the encoder/decoder layers; can be one of:encoder,:decoder,both,:noneuse_observed_lib_size: whether or not to use the observed library size (iffalse, library size is calculated by a dedicated encoder)var_activation: whether or not to use an activation function for the variance layer in the encodervar_eps: numerical stability constant to add to the variance in the reparameterisation of the latent representationseed: random seed to use for initialization of model parameters; to ensure reproducibility.
Returns
scVAEobject