The original paper is

Understanding VAE

The main purpose of VAE is to propose a method to learn the distribution of the given data $p(x)$ , it assumes the latent distribution model, i.e., $p_\theta(x) = p_\theta(x|z)p_\theta(z)$.

The distribution is parametric by $\theta$ , and our goal is to learn the optimal $\theta^*$ via maximizing marginal log likelihood (ML),

$$ \theta^* = \underset{\theta}{\operatorname{argmax}} \enspace \int p_\theta(x|z)p_\theta(z) dz $$

The analytic solution is usually not existed, due to the intractable integration. For some simple cases, e.g. Gaussian mixture model, $z$ follows a multinomial distribution, and we could effectively estimate $p(z)$ via its posterior $p_{\theta}(z|x)=p_\theta(x|z)p_{\theta}(z)/p_{\theta}(x)$. Then, EM algorithm could be used.

However, for more general cases, where $z$ is not restricted to multinomial, the posterior is usually intractable as well.

Tips

Denoising Criterion for Variational Autoencoding Framework

Some useful tricks in training variational autoencoder

https://github.com/loliverhennigh/Variational-autoencoder-tricks-and-tips/blob/master/README.md

Github

AntixK/PyTorch-VAE

NVlabs/NVAE