Often in statistics, we want to compare the means of two different groups. The key distinction is whether the samples are independent or paired.
Independent samples occur when:
Paired samples occur when:
This distinction is critical because it changes how we analyze the data.
When comparing two independent samples, we want to test whether the population means are equal.
Null hypothesis: H₀: μ₁ = μ₂, or equivalently, μ₁ - μ₂ = 0
Alternative hypothesis: Hₐ: μ₁ != μ₂ (two-tailed), or μ₁ > μ₂ (one-tailed), or μ₁ < μ₂ (one-tailed)
When we assume equal variances in the two populations, we pool the sample variances to get a better estimate:
Key concept: The pooled SD is a weighted average of the two sample standard deviations, weighted by their respective degrees of freedom.
The numerator combines the squared deviations from both groups. The denominator is n₁ + n₂ - 2, which is the total degrees of freedom available from both samples.
The standard error of the difference in means is:
The test statistic follows a t-distribution with df = n₁ + n₂ - 2:
A confidence interval for the difference in means (μ₁ - μ₂) is:
where t* is the critical value from the t-distribution with df = n₁ + n₂ - 2.
Let's compare sleep times between fixed and intact male ragdoll cats.
R makes this easy with the t.test() function:
Pooling is valid when we assume the populations have equal variances. The key insight is that we're combining information from both samples to estimate a common population standard deviation.
Key concept: Pooling gives us more information (higher degrees of freedom) and thus more power to detect differences, IF the equal variance assumption is reasonable.
The weights in the pooled SD formula, (n₁ - 1) and (n₂ - 1), reflect how much information each sample contributes. Larger samples get more weight because their variances are more stable estimates.
However, if the variances truly are different, pooling can be misleading. This is where Welch's t-test comes in.
When sample standard deviations are substantially different, or when we're unsure about equality of variances, Welch's t-test is safer. The key differences:
1. Do NOT pool the standard deviations
3. Use the Welch-Satterthwaite degrees of freedom (more complex, typically reported by software)
The Welch-Satterthwaite formula for degrees of freedom is:
This looks complex, but the interpretation is straightforward: it reduces the degrees of freedom when variances are unequal, reflecting the loss of information from having to estimate two different population standard deviations.
In R, var.equal=FALSE (the default) uses Welch's method:
Using t.test() directly:
Guidance for choosing between equal-variance and Welch's t-tests:
Key concept: Welch's t-test is more conservative and doesn't lose power when variances are actually equal. Most statisticians recommend Welch as the default choice unless you have good reason to assume equal variances.
R's default is var.equal=FALSE (Welch), which reflects modern statistical practice.
When data is paired (same subjects measured twice, or matched subjects), we have a different situation. The key is that measurements are not independent across groups.
The genius of paired testing is that we convert a two-sample problem into a one-sample problem:
1. Compute the differences: dᵢ = xᵢ₁ - xᵢ₂ for each pair
2. Treat the differences as a single sample
3. Test whether the mean difference is zero
This is a one-sample t-test on the differences, with df = n - 1 (where n is the number of pairs).
d = mean of the differences, sd = standard deviation of the differences, n = number of pairs
Consider height growth from age 13 to age 14 in 5 individuals:
Now compare to an INCORRECT analysis that ignores pairing:
Compare results:
Key concept: The paired test gives t = 5.331, p = 0.00793 (highly significant). The unpaired test gives t = 1.248, p = 0.2509 (not significant). This dramatic difference shows why pairing is crucial. When data is paired and we fail to pair in the analysis, we throw away important information and lose power to detect real effects.
The paired analysis is much more powerful because it controls for individual differences in height. By looking at changes within individuals, we reduce noise.
Both produce identical results. The key difference in the function call: paired = TRUE tells R to compute differences first.
A 95% confidence interval for the difference in population means (μ₁ - μ₂) tells us:
Key concept: If we repeated the sampling procedure many times and computed a confidence interval each time, approximately 95% of those intervals would contain the true population difference.
Practical interpretation:
For the equal variance test: 90% CI: [0.0495, 1.3505]
Interpretation: We're 90% confident the true difference in sleep times is between 0.05 and 1.35 hours, favoring the fixed cats.
For the Welch test with unequal variances: 95% CI: [-0.3296, 1.7296]
Interpretation: This wider interval reflects the greater uncertainty from unequal variances. It includes 0, so we don't have strong evidence of a difference.
For the height growth data, a 95% CI for mean growth:
We're 95% confident the true mean height growth from age 13 to 14 is between 0.53 and 2.67 cm.
There's a beautiful connection between confidence intervals and hypothesis tests.
For a two-tailed hypothesis test with significance level α:
Looking back at our paired growth test:
Key concept: The confidence interval and hypothesis test are two views of the same underlying question. The CI tells us not just whether a difference exists, but also the range of plausible values.
| Scenario | df | SE Formula | Assumption | R Code |
|---|---|---|---|---|
| Independent, equal var | n1+n2-2 | sₚ*sqrt(1/n₁ + 1/n₂) | sigma1=sigma2 | var.equal=TRUE |
| Independent, unequal var | Welch-Satterthwaite | sqrt(s1^2/n₁ + s2^2/n₂) | None (safer) | var.equal=FALSE (default) |
| Paired | n-1 | sd/√n | Differences normal | paired=TRUE |
1. Always distinguish between independent and paired data structures
2. When data is paired, compute differences and treat as one-sample problem
3. Welch's t-test is safer as a default for independent samples
4. Equal variance t-test assumes (and requires) similar population variances
5. Confidence intervals and hypothesis tests tell complementary stories
6. The df and SE change based on the test choice
7. Failing to recognize and properly analyze paired data can lead to missing real effects