(Editor Note: These notes are not my own text they are copied from MacKay and the Astrophysics source below. They’ll be gradually edited over the next few months I provide them because writing them was useful for me)

The Hamiltonian Monte Carlo is a Metropolis method, applicable to continuous state spaces, that makes use of gradient information to reduce random walk behaviour.

These notes are based on a combination of http://python4mpia.github.io/index.html and David MacKays book on Inference and Information theory.

For many systems whose probability can be written in the form

not only but also its gradient with respect to

can be readily evaluated. It seems wasteful to use a simple random-walk

Metropolis method when this gradient is available – the gradient indicates

which direction one should go in to find states that have higher probability!

**Overview**

In the Hamiltonian Monte Carlo method, the state space x is augmented by

momentum variables p, and there is an alternation of two types of proposal.

The first proposal randomize the momentum variable, leaving the state x un-changed. The second proposal changes both x and p using simulated Hamiltonian dynamics as defined by the Hamiltonian

,

where is a ‘kinetic energy’ such as These two proposals are used to create

(asymptotically) samples from the joint density

This density is separable, so the marginal distribution of is the

desired distribution . So, simply discarding the momentum variables, we obtain a sequence of samples that asymptotically come from

**An algorithm for Hamiltonian Monte-Carlo**

First, we need to compute the gradient of our objective function.

Hamiltonian Monte-Carlo makes use of the fact, that we can write our likelihood as

where is the “energy”. The algorithm then uses

Hamiltonian dynamics to modify the way how candidates are proposed:

The true value we used to generate the data was . The Monte-Carlo estimate is . Mean: 2.35130302322 Sigma: 0.00147375480435 Acceptance rate = 0.645735426457