*We know correlation does not imply causality. What does? *

I have seen the following situation countless times:

- newspaper publishes article with the title ‘humans with long noses are more prone to having babies’, or something equally ridiculous-sounding;
*invariably*one commenter points out that the authors are idiots, because ‘correlation does not imply causation’; and, d'oh!, they only showed correlation

Which *begs* the question: *What* implies causation? Somehow, the discussion rarely proceeds to this third step.

It turns out, nothing really implies causation. So, saying that ‘correlation doesn't imply causation’ is kinda meaningless.

We can still gather evidence that makes us more or less sure that a causal relation exists. What kind of evidence? For example, correlation. But, yes, (Pearson) correlation constitutes only weak evidence. There are three problems with it:

- It only detects linear dependences.
- It does not say which way the influence goes.
- It does not account for hidden variables. (Perhaps a third thing causes both two observed things.)

To alleviate the first problem, you can compute mutual information instead.

To alleviate the second problem, you can try to build a predictive model. Say you want to see if $x$ causes $y$. Then you come up with an algorithm which you feed observations of $x$ and out come predictions of $y$. If you did a good job, then you have fairly high confidence that a causal connection exists. This is basically how physics works. There is always the danger that you'll discover a setup in which your algorithm's predictions are bad.

If $x$ and $y$ are time-series and you bring back the assumption that dependency is linear, then there are nice mathematical tools you can use: see Granger causality.

I don't know of any good way to account for hidden variables, apart from ‘test, test again’. A particularly good way to test is to make sure you are in control, and

- try to systematically cover all possible values of the independent variable $x$
- try to systematically cover all possible values of all other possibly relevant variables you can think of

But, in the end, we'll never be absolutely sure that one thing causes another.