Phylogenetics using Variational Inference


Mathieu Fourment

ithree institute
University of Technology Sydney

Approximating Distributions in Phylogenetics

  • Usually approximate posterior with MCMC and propose parameters one by one
  • Approximate distribution of one branch length using a parametric distribution (Aberer et al. 2016)
  • Approximate distribution of one branch length using specialised surrogate function (Claywell et al. 2017)
  • Adaptive MCMC to jointly propose parameters (Baele et al. 2017)

Approximating the distribution of one branch length

  • Fitting parametric distribution
  • Goal is to improve sampling the tree space using an independence sampler

Aberer, Stamatakis, Ronquist. Syst Biol 2016

Approximating the distribution of a branch length

  • Fitting a simpler distribution:
  • $$ f(c, m, r, b; t) = c \log \left ( \frac{1 + e^{(r(t + b))^{-1}}}{2} \right ) + m \log \left(\frac{1 - e^{(r(t + b))^{-1}}}{2} \right) $$
  • Parameters: c (# constant sites), m (# mutated sites), r (rate), b (truncation)
  • Nonlinear least-squares optimization
  • Sampling from surrogate function using rejection sampling
  • Improved the efficiency of SMC sampler (see poster)

Claywell, Dinh, Fourment, McCoy, Matsen. MBE 2017

Fourment, Claywell, Dinh, McCoy, Matsen, Darling. Syst Biol 2017

Variational inference

Minimize the Kullback Leibler divergence from variational distribution $q$ to posterior distribution $p$

$$ \boldsymbol{\phi}^* = \operatorname*{arg\,min}_{\boldsymbol{\phi} \in \boldsymbol{\Phi}} \operatorname{KL}(q(\boldsymbol{\theta}; \boldsymbol{\phi}) \parallel p(\boldsymbol{\theta} \mid \mathbf{x})) $$

Review: Blei, Kucukelbir, McAuliffe 2016

Evidence lower bound (ELBO)

\[ \begin{aligned} \operatorname{KL}(q(\boldsymbol{\theta}; \boldsymbol{\phi}) \parallel p(\boldsymbol{\theta} \mid \mathbf{x})) & = \mathop{\mathbb{E}}[\log q(\boldsymbol{\theta})] - \mathop{\mathbb{E}}[\log p(\boldsymbol{\theta} \mid \mathbf{x})] \\ & = \mathop{\mathbb{E}}[\log q(\boldsymbol{\theta})] - \mathop{\mathbb{E}}[\log p(\boldsymbol{\theta}, \mathbf{x})] + \log p(\mathbf{x}) \end{aligned}\]

$p(\mathbf{x})$ constant with respect to $q(\boldsymbol{\theta})$

Instead of minimizing KL divergence, maximize evidence lower bound:

$$ \textrm{ELBO}(q) = \mathop{\mathbb{E}}_{q(\boldsymbol{\theta}; \boldsymbol{\phi})}[\log p(\mathbf{x}, \boldsymbol{\theta}) - \log q(\boldsymbol{\theta}; \boldsymbol{\phi})]$$

ELBO(q) is the lower bound of evidence:

$$\log p(\mathbf{x}) \geq \textrm{ELBO}(q)$$

Variational distributions

Mean-field Gaussian:

$$ q(\boldsymbol{\theta}; \boldsymbol{\phi}) = \mathcal{N}(\boldsymbol{\theta}; \boldsymbol{\mu}, diag(\boldsymbol{\sigma}^2)) = \prod_{i=1}^n \mathcal{N}(\theta_i; \mu_i, \sigma_i^2) $$

Full-rank Gaussian:

$$ q(\boldsymbol{\theta}; \boldsymbol{\phi}) = \mathcal{N}(\boldsymbol{\theta}; \boldsymbol{\mu}, \boldsymbol{\Sigma}) $$

Algorithm

  • Stochastic gradient ascent to optimize the ELBO
  • Requires calculating gradient of probability models
  • Transformation of constrained variables (e.g. branch length lives in $\mathbb{R}^+$)

Variational inference software

Require calculating derivatives (automatic differentiation)
  • Stan (HMC, MCMC)
  • Edward, BayesPi, PyMC3, Pyro (Uber) ...

Simple Stan model

Simulation study

  • Random coalescent tree with 6 taxa
  • Simulate alignment with GTR
  • Every ranked labelled tree was enumerated (2700 topologies)
  • Analyse with BEAST and Stan

Estimates using true tree

Marginal likelihood vs. ELBO

  • Marginal likelihood calculated using path sampling
  • Stochastic gradient ascent trapped in local maxima

Marginal likelihood vs. ELBO

  • Rerun Stan several times and chose the highest ELBO
  • Slope: 0.99 Intercept: -11.9

Parameter correlation from MCMC output

  • Heatmap representing correlation matrix (Red=positive, blue: negative correlation )
  • Strong correlation between some parameters (e.g. adjacent branches)
  • Mean-field variational model ignores correlation between model parameters

Correlation in a larger tree

Parameter correlation with variational model

Problems and ideas

  • Short branches are difficult to approximate. Problem also encountered in surrogate and parametric approximations of a single branch
  • Reducing the number parameters
  • Setting covariances to 0: we should expect low or no correlation between branches that are far apart

Summary

  • Testing sparse covariance matrix for full-rank model
  • Not possible in Stan, working on a specialised phylogenetic variational program
  • Combining multiple trees
  • Investigate other divergence measures (lower and upper bound of marginal likelihood)

Acknowledgements


  • Aaron Darling, University of Technology Sydney

  • Erick Matsen, Fred Hutchinson Cancer Research

  • The following organizations: