this post was submitted on 11 Oct 2023
8 points (100.0% liked)
The R Project for Statistical Computing
21 readers
1 users here now
Everything about the R programming language.
Rules
- No bigotry
Check out
-
RStudio Community forum
-
#rstats on Twitter
founded 4 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
It really depends on what you mean by "population data". If you mean that you have data on every person (or object, or whatever your research is about) in the population you are interested about, then the is no need for p-values. The mean you calculate IS the actual population mean and there is no room for error (assuming each measurement is correct). If you just mean "a big dataset from the population" the inference statistics can still make sense.
One thing to consider is that mathematically a t- or z-test always assumes that the population is infinitely large (the confidence interval reaches zero at infinity), while in reality, as described above, your confidence interval should already be zero when your sample size is equal to the actual population size.
Hope that helps. ;)
@arandomthought
I read some similar comments online, but there were also positions contrary, but I think this makes sense.
And I didn't know about the infinite population thing, that is interesting.
If I may a follow up: despite p values, regression models and correlation tests can still be interesting to apply to census data to measure effect sizes and such, right?
Look up super-population theory. It is based on the idea that even a perfect census is only a point-in-time estimate of the theoretical "super-population" that the point-in-time population is derived from. In large, real world populations, people are constantly coming and going. If we assume that this coming and going is random and the relevant super-population parameters do not change over time, it is easy to see a census population as a sample instance from a larger super-population. While somewhat theoretical, this is a useful model when estimating relationships between variables in census data and leads to the use of standard frequentist confidence intervals and, yes, even p-values.
@sailingbythelee I've seen this argument around, but not with reference to a formalized theory. Now I've got something to look up and see if it makes sense for my research, thank you so much!