The scVAE model

Encoder

The implementation is based on the Python implementation of the scvi-tools encoder.

scVI.scEncoderType
mutable struct scEncoder

Julia implementation of the encoder of a single-cell VAE model corresponding to the scvi-tools encoder. Collects all information on the encoder parameters and stores the basic encoder and mean and variance encoders. Can be constructed using keywords.

Keyword arguments

  • encoder: Flux.Chain of fully connected layers realising the first part of the encoder (before the split in mean and variance). For details, see the source code of FC_layers in src/Utils.
  • mean_encoder: Flux.Dense fully connected layer realising the latent mean encoder
  • n_input: input dimension = number of genes/features
  • n_hidden: number of hidden units to use in each hidden layer
  • n_output: output dimension of the encoder = dimension of latent space
  • n_layers: number of hidden layers in encoder and decoder
  • var_activation: whether or not to use an activation function for the variance layer in the encoder
  • var_encoder: Flux.Dense fully connected layer realising the latent variance encoder
  • var_eps: numerical stability constant to add to the variance in the reparameterisation of the latent representation
  • z_transformation: whether to apply a softmax transformation the latent z if assuming a lognormal instead of a normal distribution
source
scVI.scEncoderMethod
scEncoder(
    n_input::Int, 
    n_output::Int;
    activation_fn::Function=relu, # to use in FC_layers
    bias::Bool=true,
    n_hidden::Union{Int,Vector{Int}}=128,
    n_layers::Int=1,
    distribution::Symbol=:normal,
    dropout_rate::Float32=0.1f0,
    use_activation::Bool=true,
    use_batch_norm::Bool=true,
    use_layer_norm::Bool=false,
    var_activation=nothing,
    var_eps::Float32=Float32(1e-4)
)

Constructor for an scVAE encoder. Initialises an scEncoder object according to the input parameters. Julia implementation of the scvi-tools encoder.

Arguments:

  • n_input: input dimension = number of genes/features
  • n_output: output dimension of the encoder = latent space dimension

Keyword arguments:

  • activation_fn: function to use as activation in all encoder neural network layers
  • bias: whether or not to use bias parameters in the encoder neural network layers
  • n_hidden: number of hidden units to use in each hidden layer (if an Int is passed, this number is used in all hidden layers, alternatively an array of Ints can be passed, in which case the kth element corresponds to the number of units in the kth layer.
  • n_layers: number of hidden layers in encoder
  • distribution :whether to use a :normal or lognormal (:ln) distribution for the latent z
  • dropout_rate: dropout to use in all encoder layers. Setting the rate to 0.0 corresponds to no dropout.
  • use_activation: whether or not to use an activation function in the encoder neural network layers; if false, overrides choice in actication_fn
  • use_batch_norm: whether or not to apply batch normalization in the encoder layers
  • use_layer_norm: whether or not to apply layer normalization in the encoder layers
  • var_activation: whether or not to use an activation function for the variance layer in the encoder
  • var_eps: numerical stability constant to add to the variance in the reparameterisation of the latent representation
source

Decoder

The implementation is based on the Python implementation of the scvi-tools decoder.

scVI.scDecoderType
mutable struct scDecoder <: AbstractDecoder

Julia implementation of the decoder for a single-cell VAE model corresponding to the scvi-tools decoder. Collects all information on the decoder parameters and stores the decoder parts. Can be constructed using keywords.

Keyword arguments

  • n_input: input dimension = dimension of latent space
  • n_hidden: number of hidden units to use in each hidden layer (if an Int is passed, this number is used in all hidden layers, alternatively an array of Ints can be passed, in which case the kth element corresponds to the number of units in the kth layer.
  • n_output: output dimension of the decoder = number of genes/features
  • n_layers: number of hidden layers in decoder
  • px_decoder: Flux.Chain of fully connected layers realising the first part of the decoder (before the split in mean, dispersion and dropout decoder). For details, see the source code of FC_layers in src/Utils.
  • px_dropout_decoder: if the generative distribution is zero-inflated negative binomial (gene_likelihood = :zinb in the scVAE model construction): Flux.Dense layer, else nothing.
  • px_r_decoder: decoder for the dispersion parameter. If generative distribution is not some (zero-inflated) negative binomial, it is nothing. Else, it is a parameter vector or a Flux.Dense, depending on whether the dispersion is estimated per gene (dispersion = :gene), or per gene and cell (dispersion = :gene_cell)
  • px_scale_decoder: decoder for the mean of the reconstruction, Flux.Chain of a Dense layer followed by softmax activation
  • use_batch_norm: whether or not to apply batch normalization in the decoder layers
  • use_layer_norm: whether or not to apply layer normalization in the decoder layers
source
scVI.scDecoderMethod
scDecoder(n_input, n_output; 
    activation_fn::Function=relu,
    bias::Bool=true,
    dispersion::Symbol=:gene,
    dropout_rate::Float32=0.0f0,
    gene_likelihood::Symbol=:zinb,
    n_hidden::Union{Int,Vector{Int}}=128,
    n_layers::Int=1, 
    use_activation::Bool=true,
    use_batch_norm::Bool=true,
    use_layer_norm::Bool=false
)

Constructor for an scVAE decoder. Initialises an scDecoder object according to the input parameters. Julia implementation of the scvi-tools decoder.

Arguments:

  • n_input: input dimension of the decoder = latent space dimension
  • n_output: output dimension = number of genes/features in the data

Keyword arguments:

  • activation_fn: function to use as activation in all decoder neural network layers
  • bias: whether or not to use bias parameters in the decoder neural network layers
  • dispersion: whether to estimate the dispersion parameter for the (zero-inflated) negative binomial generative distribution per gene (:gene) or per gene and cell (:gene_cell)
  • dropout_rate: dropout to use in all decoder layers. Setting the rate to 0.0 corresponds to no dropout.
  • n_hidden: number of hidden units to use in each hidden layer (if an Int is passed, this number is used in all hidden layers, alternatively an array of Ints can be passed, in which case the kth element corresponds to the number of units in the kth layer.
  • n_layers: number of hidden layers in decoder
  • use_activation: whether or not to use an activation function in the decoder neural network layers; if false, overrides choice in actication_fn
  • use_batch_norm: whether or not to apply batch normalization in the decoder layers
  • use_layer_norm: whether or not to apply layer normalization in the decoder layers
source

VAE model

The implementation is a basic version of the scvi-tools VAE object.

scVI.scVAEType
mutable struct scVAE

Julia implementation of the single-cell Variational Autoencoder model corresponding to the scvi-tools VAE object. Collects all information on the model parameters such as distribution choices and stores the model encoder and decoder. Can be constructed using keywords.

Keyword arguments

  • n_input::Ind: input dimension = number of genes/features
  • n_batch::Int=0: number of batches in the data
  • n_hidden::Int=128: number of hidden units to use in each hidden layer
  • n_latent::Int=10: dimension of latent space
  • n_layers::Int=1: number of hidden layers in encoder and decoder
  • dispersion::Symbol=:gene: can be either :gene or :gene-cell. The Python scvi-tools options :gene-batch and gene-label are planned, but not supported yet.
  • is_trained::Bool=false: indicating whether the model has been trained or not
  • dropout_rate: Dropout to use in the encoder and decoder layers. Setting the rate to 0.0 corresponds to no dropout.
  • gene_likelihood::Symbol=:zinb: which generative distribution to parameterize in the decoder. Can be one of :nb (negative binomial), :zinb (zero-inflated negative binomial), or :poisson (Poisson).
  • latent_distribution::Symbol=:normal: whether or not to log-transform the input data in the encoder (for numerical stability)
  • library_log_means::Union{Nothing, Vector{Float32}}: log-transformed means of library size; has to be provided when not using observed library size, but encoding it
  • library_log_vars::Union{Nothing, Vector{Float32}}: log-transformed variances of library size; has to be provided when not using observed library size, but encoding it
  • log_variational: whether or not to log-transform the input data in the encoder (for numerical stability)
  • loss_registry::Dict=Dict(): dictionary in which to record the values of the different loss components (reconstruction error, KL divergence(s)) during training
  • use_observed_lib_size::Bool=true: whether or not to use the observed library size (if false, library size is calculated by a dedicated encoder)
  • z_encoder::scEncoder: Encoder struct of the VAE model for latent representation; see scEncoder
  • l_encoder::Union{Nothing, scEncoder}: Encoder struct of the VAE model for the library size (if use_observed_lib_size==false), see scEncoder
  • decoder::AbstractDecoder: Decoder struct of the VAE model; see scDecoder
source
scVI.scVAEMethod
scVAE(n_input::Int;
    activation_fn::Function=relu, # to be used in all FC_layers instances
    bias::Symbol=:both, # whether to use bias in all linear layers of all FC instances 
    dispersion::Symbol=:gene,
    dropout_rate::Float32=0.1f0,
    gene_likelihood::Symbol=:zinb,
    latent_distribution::Symbol=:normal,
    library_log_means=nothing,
    library_log_vars=nothing,
    log_variational::Bool=true,
    n_batch::Int=1,
    n_hidden::Union{Int,Vector{Int}}=128,
    n_latent::Int=10,
    n_layers::Int=1,
    use_activation::Symbol=:both, 
    use_batch_norm::Symbol=:both,
    use_layer_norm::Symbol=:none,
    use_observed_lib_size::Bool=true,
    var_activation=nothing,
    var_eps::Float32=Float32(1e-4),
    seed::Int=1234
)

Constructor for the scVAE model struct. Initialises an scVAE model with the parameters specified in the input arguments. Basic Julia implementation of the scvi-tools VAE object.

Arguments:

  • n_input: input dimension = number of genes/features

Keyword arguments

  • activation_fn: function to use as activation in all neural network layers of encoder and decoder
  • bias: whether or not to use bias parameters in the neural network layers of encoder and decoder
  • dispersion: can be either :gene or :gene-cell. The Python scvi-tools options :gene-batch and gene-label are planned, but not supported yet.
  • dropout_rate: Dropout to use in the encoder and decoder layers. Setting the rate to 0.0 corresponds to no dropout.
  • gene_likelihood: which generative distribution to parameterize in the decoder. Can be one of :nb (negative binomial), :zinb (zero-inflated negative binomial), or :poisson (Poisson).
  • library_log_means: log-transformed means of library size; has to be provided when not using observed library size, but encoding it
  • library_log_vars: log-transformed variances of library size; has to be provided when not using observed library size, but encoding it
  • log_variational: whether or not to log-transform the input data in the encoder (for numerical stability)
  • n_batch: number of batches in the data
  • n_hidden: number of hidden units to use in each hidden layer (if an Int is passed, this number is used in all hidden layers, alternatively an array of Ints can be passed, in which case the kth element corresponds to the number of units in the kth layer.
  • n_latent: dimension of latent space
  • n_layers: number of hidden layers in encoder and decoder
  • use_activation: whether or not to use an activation function in the neural network layers of encoder and decoder; if false, overrides choice in actication_fn
  • use_batch_norm: whether to apply batch normalization in the encoder/decoder layers; can be one of :encoder, :decoder, both, :none
  • use_layer_norm: whether to apply layer normalization in the encoder/decoder layers; can be one of :encoder, :decoder, both, :none
  • use_observed_lib_size: whether or not to use the observed library size (if false, library size is calculated by a dedicated encoder)
  • var_activation: whether or not to use an activation function for the variance layer in the encoder
  • var_eps: numerical stability constant to add to the variance in the reparameterisation of the latent representation
  • seed: random seed to use for initialization of model parameters; to ensure reproducibility.
source