Recently I’ve talked about the different standards for existential and universal claims, how we can use representative samples to estimate universal claims, and how we know if our representative sample is big enough to be “statistically significant.” But I want to add a word of caution to these tests: you can’t get statistical significance without a representative sample.
If you work in social science you’ve probably seen p-values reported in studies that aren’t based on representative samples. They’re probably there because the authors took one required statistics class in grad school and learned that low p-values are good. It’s quite likely that these p-values were actually expected, if not explicitly requested, by the editors or reviewers of the article, who took a similar statistics class. And they’re completely useless.
P-values tell you whether your observation (often a mean, but not always) is based on a big enough sample that you can be 99% (or whatever) sure it’s not the luck of the draw. You are clear to generalize your representative sample to the entire population. But if your sample is not representative, it doesn’t matter!
Suppose you need 100% pure Austrian pumpkin seed oil, and you tell your friend to make sure he gets only the 100% pure kind. Your friend brings you 100% pure Australian tea tree oil. They’re both oils, and they’re both 100% pure, so your friend doesn’t understand why you’re so frustrated with him. But purity is irrelevant when you’ve got the wrong oil. P-values are the same way.
So please, don’t report p-values if you don’t have a representative sample. If the editor or reviewer insists, go ahead and put it in, but please roll your eyes while you’re running your t-tests. But if you are the editor or reviewer, please stop asking people for p-values if they don’t have a representative sample! Oh, and you might want to think about asking them to collect a representative sample?