## What is a Hamiltonian Monte-Carlo method?

(Editor Note: These notes are not my own text they are copied from MacKay and the Astrophysics source below. They’ll be gradually edited over the next few months I provide them because writing them was useful for me)

The Hamiltonian Monte Carlo is a Metropolis method, applicable to continuous state spaces, that makes use of gradient information to reduce random walk behaviour.
These notes are based on a combination of  http://python4mpia.github.io/index.html and David MacKays book on Inference and Information theory.

For many systems whose probability $P(x)$ can be written in the form
$P(x) = \frac{\exp^{-E(x)}}{Z}$
not only $-E(x)$ but also its gradient with respect to $x$
can be readily evaluated. It seems wasteful to use a simple random-walk
Metropolis method when this gradient is available – the gradient indicates
which direction one should go in to find states that have higher probability!

Overview
In the Hamiltonian Monte Carlo method, the state space x is augmented by
momentum variables p, and there is an alternation of two types of proposal.
The first proposal randomize the momentum variable, leaving the state x un-changed. The second proposal changes both x and p using simulated Hamiltonian dynamics as defined by the Hamiltonian
$H(\mathbf{x}, \mathbf{p}) = E(\mathbf{x}) + K(\mathbf{p})$,
where $K(\mathbf{p})$ is a ‘kinetic energy’ such as $K(\mathbf{p}) = \mathbf{p^{\dagger}}\mathbf{p} / 2.$ These two proposals are used to create
(asymptotically) samples from the joint density
$P_{H}(\mathbf{x}, \mathbf{p}) = \frac{1}{Z_{H}} \exp{[-H(\mathbf{x}, \mathbf{p})]} = \frac{1}{Z_{H}}\exp{[-E(\mathbf{x})]} \exp{[-K(\mathbf{p})]}$

This density is separable, so the marginal distribution of $\mathbf{x}$ is the
desired distribution $\exp{[-E(\mathbf{x})]}/Z$. So, simply discarding the momentum variables, we obtain a sequence of samples $\lbrace\mathbf{x}^{(t)}\rbrace$ that asymptotically come from $P(\mathbf{x})$

An algorithm for Hamiltonian Monte-Carlo
First, we need to compute the gradient of our objective function.
Hamiltonian Monte-Carlo makes use of the fact, that we can write our likelihood as
$\mathcal{L} = \exp{log \mathcal{L}} = \exp{-E}$
where $E = - log\mathcal{L}$ is the “energy”. The algorithm then uses
Hamiltonian dynamics to modify the way how candidates are proposed:

The true value we used to generate the data was $\alpha = 2.35$.
The Monte-Carlo estimate is $\hat{\alpha} = 2.3507$.

Mean: 2.35130302322
Sigma: 0.00147375480435
Acceptance rate = 0.645735426457