Confidence interval for a population mean with a known and an unknown variance

CFA level I / Quantitative Methods: Application / Sampling and Estimation / Confidence interval for a population mean with a known and an unknown variance

When population variance is known, and the distribution is normal, then z-statistic is used for all sample sizes for calculating reliability factor.

When the population variance is unknown, and the distribution is normal, then t-statistic is used for small sample sizes. For large sample sizes, even z-statistic can also be used because t-distribution approaches normal distribution as the sample size increases. However, t-statistic would give a more accurate confidence interval. The confidence interval using t-statistic is always wider than the confidence interval of z-statistic.

When the population distribution is non-normal, and the sample size is small, then no statistic is available for computing reliability factor. However, when the sample size is greater than or equal to 30, then the distribution of sample statistic approaches normal even if the underlying population distribution is non-normal. Then, either of z-statistic or t-statistic can be used for an unknown variance, and z-statistic is used for a known variance.

Example 2: Calculating confidence interval for normal distribution

The annual returns of a mutual fund are following a normal distribution. What is the 95 percent confidence interval for the population mean of annual returns if the sample mean is 8.50 percent and the sample size is 25?
,br> (a) The population standard deviation is known and equals 10.00 percent.
(b) The population standard deviation in unknown and the sample standard deviation is 10.00 percent.

Solution:

(a) When the population variance is known, the 95 percent confidence interval = Point estimate ± Reliability factor*Standard error = X ̅ ± z_0.025*σ/√n = 8.50 percent ± 1.96*10.00 percent = -11.10 percent to 28.10 percent.

(b) When the population variance is unknown and the sample size is less than 30, we will use t-statistic. The 95 percent confidence interval = X ̅ ± t_0.025*s/√n = 8.50 percent ± 2.064*10.00 percent = -12.14 percent to 29.14 percent.

We use 24 (=n-1 = 25-1) degrees of freedom for checking the t-statistic value at 0.025 percent.

Please note that the range for the t-statistic is wider than the z-statistic for same mean and standard deviation.

Example 3: Calculating confidence interval for nonnormal distribution

Ross is considering an investment in a hedge fund. He wants to know the 95 percent confidence interval for the population mean of the annual returns of a hedge fund. The average annual return of the hedge fund is 12 percent. The population standard deviation of the returns is 20 percent. The returns of the hedge fund do not follow a normal distribution. What is the 99 percent confidence interval given the sample size is:

(a) 20
(b) 100

Solution:

(a) When the sample size is 20 (less than 30), and the distribution is non-normal then there is no test statistic available for calculating the confidence interval.

(b) When the sample size is large, and the distribution is nonnormal then by central limit theorem, the distribution of the population mean will be approximately normal. Since the population variance is known, z-statistic will be used to calculate the confidence interval.

99 percent confidence interval = = X ̅ ± z_0.005*σ/√n = 12.00 percent ± 2.58*20.00 percent = -39.6 percent to 63.6 percent.

Statistic to be used for computing reliability factors
Sample taken from	Small sample size	Large sample size
Normal distribution with known variance	z-statistic	z-statistic
Normal distribution with unknown variance	t-statistic	t-statistic or z-statistic
Nonnormal distribution with known variance	Not available	z-statistic
Nonnormal distribution with unknown variance	Not available	t-statistic or z-statistic

The impact of sample size on the width of confidence interval: The larger is the sample size, the narrower is the width of the confidence interval. The width of the confidence interval depends on two factors: Reliability factor and standard error. The standard error decreases with an increase in sample size because it is equal to s/√n. The reliability factor also increases with increase in the sample size (n) because then the degrees of freedom increases and the reliability factor decreases with an increase in degrees of freedom.

Previous LOS: Student's t-distribution and its degrees of freedom

Next LOS: Data-mining bias, sample selection bias, survivorship bias, look-ahead bias, and time-period bias