# What you need to know about statistics

Statistics is an essential knowledge that all researchers in the world should know.

Note: You can find a PDF version here:  How to do ANOVA (PDF) and workshop with answers (.docx)

You need to understand the followings:

1. Descriptive statistics - mean, median, SD, categories of data, normal distributions, histogram
2. Inferential statistics - parametric and non-parametric
3. Null and alternative hypothesis
4. Type-I and type-II errors
5. P-value means and what it does not (statistical significance vs. practical significance)
6. Assumptions of each test
7. Know what  "main effect" and "interaction effect" means
8. Know how to report in APA style, e.g., p-value, effect size
9. Know what is posthoc test, and when should you use it
10. Know how to plot data correctly with 95% CI error bars, and visualization approaches
11. Aware that correlation does not mean causation
12. If you apply machine learning, understand the math behind, the limitations of each algorithm, instead of just using the algorithm in blackbox.

Some tips:

1. Not significant does not mean your work is bad, it could be interesting, try to think why it happens
2. Statistics is just a tool.  Focus on interpreting against your hypotheses.

Example of real review you will get when you use wrong statistical methods:

"The data is ordinal, and cannot be analyzed with parametric methods. In some cases, where the sample size is large, t-test might be applied for ordinal data, but not here, where the sample size (the number of participants) is below 20.
To me, Wilcoxon signed rank (paired samples or matched pairs) test should be the correct method
"

----

Learn from Jakob Wobbrock - https://www.coursera.org/learn/designexperiments - Highly recommended!

Excellent self-study tool for statistics provided by Jakob Wobbrock - http://depts.washington.edu/aimgroup/proj/ps4hci/

Learn basics - https://www.coursera.org/specializations/statistics  - Highly recommended

Learn from Koji Yatani - http://yatani.jp/teaching/doku.php?id=hcistats:start - good for code snippets

Book written by experts - http://www.springer.com/in/book/9783319266312

Learn how to use R - http://www.amazon.co.jp/Discovering-Statistics-Using-R/dp/1446200469  - good reference for R; I highly recommended R as the usefulness of R functionalities expands beyond HCI research; For those who likes to use GUI, maybe you can try SPSS, the same author Andy Field writes the similar book for SPSS.

How to report statistics in APA style (used in SIGCHI papers) - http://my.ilstu.edu/~jhkahn/apastats.html - Please read this before you report your data; beware that there are some CHI papers that report data wrongly.  Also, please report effect size (e.g., eta square) for statistical tests, which is a good practice.

Excellent work by Paul Cairns disseminating the problems of statistics in HCI papers - http://dl.acm.org/citation.cfm?id=1531321

Important principles as a statistician - http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004961