I’ve finished reading a short but very informative book about statistics called “How To Lie With Statistics” by Darrell Huff. Although the title may sound evil, it tells the statistical tricks that industry uses to lie to YOU! It also covers the common pitfalls that people working with data fall into. Here are my key takeaways from every chapter:
1- Your analysis is as good as your data. Without having a truly “representative sample” your results will be biased. Having a randomized sample is not always enough, you also need to take care of the experiment design so that your data reflects the truth. To spot possible biases from your sample, look at the extreme cases.
2- The word “average” can mean different things according to the distribution of the averaged variable. People often use the best “average” out of the mean, median, or mode. If the variable does not follow a normal distribution, each of them can have different values. So you should be skeptical about the results when the word “average” is used.
3- Point estimates such as mean or proportion are useless without the confidence intervals or the significance level. Sample size matters. You can toss a coin 10 times and get 80% heads but do it 10000 times and you will get a number very close to 50%.
4- A difference is a difference only if it makes a difference. When looking at the difference of a point estimate between multiple groups, you have to take standard error into account. If the range of errors overlaps, the difference might not be important.
5- Beware of any chart that does not include numbers on its axis. The visible slope of the line or the height of bars can change drastically according to the scale or range of the axis. If someone wants to exaggerate the results, they can simply shrink the scale so that any small change looks dramatic.
6- Beware of any chart that uses icons or pictures instead of bars. To show the difference between two values on a bar chart, you only increase the width or the height depending on your numerical axis. However, people that use icons or pictures increase both height and the width (the area) of the icon. This causes our brain to magnify and exaggerate the difference.
7- When authors cant prove what they want to prove, they can demonstrate completely different things and pretend that they are the same thing. For example, you can prove that there are more accidents in clear weather than there are in foggy weather. But that does not mean that clear weather is more dangerous, it simply means that people drive more when the weather is clear.
8- Correlation does not mean causation. A high correlation between two variables can be explained by a third factor that is associated with both variables. You need to put any relation into sharp inspection and think outside the box. Secondly, correlated variables are not guaranteed to stay correlated. Correlation is not immune to outliers and extreme data can change the sign and the magnitude.
9- If you are going to aggregate your data, make sure that the data points are precise. If you use rounded/approximated data, you can end up with completely false results. Furthermore, when aggregating percentages, you can’t add them up like you add up numbers. 2 consecutive 50% increase in 100 dollars is not equal to a 100% increase, it is equal to a 125% increase. You should use a geometric mean instead of arithmetic mean to calculate the average growth rate when working with percentages.
10- When looking at a statistic, always ask yourself; What does the author gain from the results? Do you have all the information such as base number, kind of average, initial sample size? Is that the real statistic, or what someone reported?, and lastly but most importantly, Does it makes sense?