Don't want to be here? Go back
Let's say we have constructed some model of a natural process. For instance, let's say we are looking at stock market prices. It might very well be that all that movement is described by a few variables. For instance, it might be that Sony's share price is mostly affected by how many people buy their headphones and at what price. If we were hardened Wall Street brokers, we might have come up with a model of how exactly the share price is affected by these variables. This model of ours would take in the number of customers and average price and produce a timeline of stock prices. Once we are convinced our model is correct, we could start applying it to the actual stock market. Looking at Sony's share prices (the observed data) we could ask the inverse question, so to speak - how many customers does Sony have?
This is a harder problem to solve. We do have a model that takes in the number of people buying Sony Headphones and produces a list of prices of shares over time. Going backwards is not necessarily as easy. Given the lots of random variations in the data, it is likely that many different outputs (share prices) could be generated by the same inputs (number of Sony customers and average price). In other words, our model has some sort of randomness built-in. This is inherent in the process anyway - as any investor will tell you, share prices fluctuate all the time even without any apparent change in the company,
The above, in a nutshell, is what we are trying to solve. Normally we create something called the likelihood, i.e. the probability of the data, given the model. This represents the probability that the outputs (the real stock prices) are generated by the inputs (the number of customers and average price).
$$ P(\mathcal{D}\ |\ \mathcal{M}) = P(\text{outputs}\ |\ \text{inputs}) $$Here $\mathcal{D}$ represents data and $\mathcal{M}$ - our model of the market. Remember, our model has some randomness, so a lot of different outputs are possible for the same input if we run the model multiple times. If we could maximize the above with respect to our inputs, we more or less done. We can answer with confidence which input parameters are most likely to produce the output. Approximate Bayesian Computation comes in when the above is very hard to compute.
Let's introduce some common notation. We will call our model inputs $\bf{\theta}$ and our model outputs $\bf{x}$. Letters in bold will repesent vectors. So for instance the number of people who buy Sony headphones and the price of headphones would both be captured in theta: $\bf{\theta} = \{\theta_1, \theta_2\}$. The share prices we are currently looking and are analyzing are the observed data, denoted by $\bf{x}^{(\text{obs})}$. We would like to know what $\bf{\theta}$ could generate $\bf{x}^{(\text{obs})}$ with what probability.
Our inputs values are not exactly random - We know some relative bound for the number of Sony customers, for instance. We can thus put a prior on $\bf{\theta}$. If we think that between 10 and 1,000,000 people buy Sony headphones (a very broad prior), we can encode that information. We have defined what is called a uniform prior - we think it is equally likely that Sony has any number of customers between 10 and 1,000,000. This is (obviously) a very wrong assumption. We specify it mathematically:
$$ \pi(\theta_1) = P(\theta_1) = \mathcal{U}(10;\ 1,000,000) $$We can now draw samples from our prior. That is, we can generate random values for $\bf{\theta}$ by randomly selecting a value within that range we just set up. Mathematically we wish to compute what is known as the posterior or $P(\theta\ |\ \bf{x}^{(obs)})$. Computing this quantity for various parameter values $\theta$ will give us the distribution of $\theta$ values and for each how likely it is to generate $\bf{x}^{(obs)}$ using it. This probability is impossible to compute, as we noted earlier. If we could com can be inverted by using Bayes' theorem like so:
$$ P(\theta\ |\ \bf{x}) = \frac{ \text{P}(\bf{x} |\ \theta) \text{P}(\theta) }{ \text{P}(\bf{x}) } $$ Notice that $P(\theta)$ is our prior, which can also be written as $\pi(\theta)$.