Covariance provides a measure of the strength of the correlation between two random variables.
Suppose X and Y are random variables with means E(X) and E(Y), and standard deviations sd(X) and sd(Y), respectively. The covariance of X and Y is defined by
What is the covariance of the data set X = (1, 2, 4, 6, 7) and Y = (2, 3, 4, 7, 9)?
| X | Y | X-E(X) | Y-E(Y) | [X-E(X)][Y-E(Y)] | |
| 1 | 2 | -3 | -3 | 9 | |
| 2 | 3 | -2 | -2 | 4 | |
| 4 | 4 | 0 | -1 | 0 | |
| 6 | 7 | 2 | 2 | 4 | |
| 7 | 9 | 3 | 4 | 12 | |
| sum | 20 | 25 | 29 | ||
| E | 4 | 5 | 5.8 |
or
| X | Y | XY | |
| 1 | 2 | 2 | |
| 2 | 3 | 6 | |
| 4 | 4 | 16 | |
| 6 | 7 | 42 | |
| 7 | 9 | 63 | |
| sum | 20 | 25 | 129 |
| E | 4 | 5 | 25.8 |
From the table above,
| cov(X, Y) | = E(XY) - E(X)E(Y) |
| = 25.8 - (4)(5) | |
| = 5.8 |