University of Edinburgh CS Student

traiko.dinev at gmail.com

Don't want to be here? Go back

Let's say we have constructed some model of a natural process. For instance, let's say we are looking at stock market prices. It might very well be that all that movement is described by a few variables. For instance, it might be that Sony's share price is mostly affected by how many people buy their headphones and at what price. If we were hardened Wall Street brokers, we might have come up with a model of how exactly the share price is affected by these variables. This model of ours would take in the number of customers and average price and produce a timeline of stock prices. Once we are convinced our model is correct, we could start applying it to the actual stock market. Looking at Sony's share prices (the observed data) we could ask the inverse question, so to speak - how many customers does Sony have?

This is a harder problem to solve. We do have a *model* that takes in the number
of people buying Sony Headphones and produces a list of prices of shares over time. Going backwards
is not necessarily as easy. Given the lots of random variations in the data, it is
likely that many different outputs (share prices) could be generated by the same inputs (number of Sony customers and average price). In other words, our model has some sort of randomness built-in. This
is inherent in the process anyway - as any investor will tell you, share prices fluctuate all the time
even without any apparent change in the company,

The above, in a nutshell, is what we are trying to solve. Normally we create something called the likelihood, i.e. the probability of the data, given the model. This represents the probability that the outputs (the real stock prices) are generated by the inputs (the number of customers and average price).

$$ P(\mathcal{D}\ |\ \mathcal{M}) = P(\text{outputs}\ |\ \text{inputs}) $$
Here $\mathcal{D}$ represents data and $\mathcal{M}$ - our *model* of the market. Remember, our
model has some randomness, so a lot of different *outputs* are possible for the same *input*
if we run the model multiple times. If we could maximize the above with respect to our *inputs*, we
more or less done. We can answer with confidence which input parameters are *most likely* to produce
the output. Approximate Bayesian Computation comes in when the above is **very hard to compute**.

Let's introduce some common notation. We will call our model inputs $\bf{\theta}$ and our model outputs $\bf{x}$. Letters in bold will repesent vectors. So for instance the number of people who buy Sony headphones and the price of headphones
would both be captured in theta: $\bf{\theta} = \{\theta_1, \theta_2\}$. The share prices we are currently
looking and are analyzing are the *observed data*, denoted by $\bf{x}^{(\text{obs})}$. We would like
to know what $\bf{\theta}$ could generate $\bf{x}^{(\text{obs})}$ with what probability.

Our inputs values are not exactly
random - We know some relative bound for the number of Sony customers, for instance. We can thus put a
**prior** on $\bf{\theta}$. If we think that between 10 and 1,000,000 people buy Sony headphones (a very *broad* prior), we can encode that information. We have defined what is called
a uniform prior - we think it is equally likely that Sony has any number of customers between 10 and 1,000,000. This is (obviously) a very wrong assumption. We specify it mathematically:

We can now **draw samples** from our prior. That is, we can generate random values
for $\bf{\theta}$ by randomly selecting a value within that range we just set up. Mathematically we
wish to compute what is known as the *posterior* or $P(\theta\ |\ \bf{x}^{(obs)})$. Computing this
quantity for various parameter values $\theta$ will give us the distribution of $\theta$ values and for
each how likely it is to generate $\bf{x}^{(obs)}$ using it. This probability is impossible to compute,
as we noted earlier. If we could com can
be inverted by using Bayes' theorem like so: