Does my sample have to be normally distributed for a t-test?

While reading up on t-test’s normality assumption I came across a lot of conflicting information. Most of the resources online suggest that for the t-test to be valid, the samples have to be normally distributed:

However, this does not have to be the case. The samples do not have to be normally distributed, given a sufficient sample size. In the t-test, our parameter of interest is the mean. According to the Central Limit Theorem, when repeatedly sampling from a population of any shape, if the sample size if sufficient, the sampling means will resemble the normal distribution:

In small samples most statistical methods do require distributional assumptions, and the case for distribution-free rank-based tests is relatively strong. However, in the large data sets typical in public health research, most statistical methods rely on the Central Limit Theorem, which states that the average of a large number of independent random variables is approximately Normally distributed around the true population mean. It is this Normal distribution of an average that underlies the validity of the t-test.

source: https://www.annualreviews.org/doi/pdf/10.1146/annurev.publhealth.23.100901.140546

In this article, I will visually demonstrate why that is the case and why the normality of the sample is not required.

The central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution (informally a bell curve) even if the original variables themselves are not normally distributed (Wikipedia)[1]. In other words, if we have a highly-skewed population and we repeatedly draw samples from it and calculate these samples’ means, the means, when plotted, will be normally distributed (called sampling distribution of the mean).

Imagine we have a large population and we need to draw a sample from it, like below.

The shape of this distribution is highly skewed with many values concentrated around the origin. The Central Limit Theorem states that if we draw many samples from this population, the means of these samples will be normally distributed. The “normalness” of the sampling distribution is dependent on the the sample size.

Below are presented the results of a simulation of drawing 1000 sampling distributions with sample sizes (n)= 5, 10, 30, and 50.

n=5, n=10
n=30, n=50

As we can see, the larger the sample size — the more “normal” the final sampling distribution will look like, regardless of the population’s distribution.

How can we be sure that the results produced by a t-test are valid, even when the population is skewed? We can perform a simple simulation. We can create a skewed population, draw 2 samples from it repeatedly and perform a t-test. Since these 2 samples are from the same population, the results of the t-test should not be significant. However, in reality we can expect the results to be significant sometimes. This sometimes will depend on the chosen significance level (alpha). At alpha=0.05, we can expect 5% of simulation results to be significant. Why is that? Type I error is committed when a null hypothesis (=no difference between sample means) is erroneously rejected. The Type I error rate directly depends on the chosen alpha. With alpha set to 0.05, 5% of the t-statistics produced, on average, will be erroneously assumed significant, even though they were produced by pure chance. Therefore, we can expect Type I error to be committed at the alpha rate, if everything goes as expected.

Below I created a left-skewed distribution for our simulation:

mean and stdev are in gray

We will now repeatedly perform 10,000 simulations with the sample size = 10.

The resulted Type I rate is 0.0497. Upon repeating this simulation over and over the rate tends of fluctuate around 0.05, which is exactly what the alpha was set to. The results are surprisingly robust even with this relatively small sample size.

For comparison purposes, I rerun the experiment with a normal distribution:

The results were essentially the same.

Recent trends in research have shown that most papers use non-parametric tests even when it is not necessary. However, if the goal of the study is to compare the difference of means between 2 samples— then the samples do not have to be normally distributed, given a sufficient sample size, and a parametric test, such as t-test, can be applied.

For more thorough discussion and experimentation with various distribution parameters refer to this paper: https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-12-78

[1] https://en.wikipedia.org/wiki/Central_limit_theorem

CS PhD @ LSU. Passionate about statistics, ML, and NLP.