**What is point estimation and what is its fundamental drawback?**

The use of single sample value such as X_bar (sample mean value) to estimate the population value is known as point estimation, because single value X_bar represent one point on the real number line. For e.g. if we are trying to calculate mean height of Indian males, we might select a random sample of 100 Indian males, calculate the sample mean height X_bar, and then use this sample-mean value as our *point estimate* of the population mean mu.

This technique has some series drawback. Let’s take an example. Suppose, we are given following population of 10 values:**{0,1,3,3,5,7,7,7,8,10}**. From this population, we constructed the sampling distribution of means for all possible samples of size N=2. If we look at the list of all possible sample means that could occur from samples of this size taken from this population, the possible values X_bar could take are the following:

The real mean of this population is mu=5.1, and none of the sample means listed is exactly equal to 5.1. In other words, it would have been impossible, using point estimation with samples of size 2, to obtain a perfectly accurate estimate of mu.

**What is interval estimation?**

*Interval Estimation* involves the estimation of a population parameter by means of a line segment (or interval) on the real-number line within which the value of the parameter is estimated to fall. Extending the previous example, suppose around each sample mean X_bar on the list we had constructed an interval of length 6 centered at the sample mean. Now considering the intervals, we constructed a table as below:

Considering intervals, we find the 79 of the 100 intervals (or 79 percent) actually contain the population mean 5.1.

In general, we can place more confidence in interval estimations than in point estimations and, by extension, more confidence in interval estimation using longer intervals than in interval estimation using shorter intervals. There is trade-off between precision of estimate and our confidence that the estimate is true.

**How to construct interval estimation?**

Under the condition of the central limit theorem, the sampling distribution of means calculated on sample of size N drawn at random from the population with mean mu and standard deviation sigma, is approximately normally distributed, with mean mu and

Also 95 percent of the area under normal curve lies within 1.96 standard deviations of its mean.

For the large sample size (more than 30) the sampling mean is approximately equal to the population mean, we can use this information to construct the confidence interval for the unknown population mean.

The confidence interval of the unknown population mean mu is given as:

The $1.96$ value comes in the formula due to the fact that 95 percent of the area under normal curve lies within 1.96 standard deviations of its mean. Following figure shows graphically that when X_bar is within

then resulting interval around X_bar will capture mu.

When X_bar is farther than:

from mu, the resulting interval around X_bar will not capture mu

Different way to interpret the confidence interval as:

I am 95 percent confident that the interval that states that the true mean value of the population is somewhere between:

The main drawback of this method is we must know the population standard deviation in order to calculate the confidence interval. In most business situation, we don’t know sigma the population standard deviation. We can construct the confidence interval estimate of mu that uses sample standard deviation S using *Student’s t-distribution.* which we will cover in the next article.

**Summary**

Point estimate uses one single sample value to represent the population parameter. Often, it is very difficult to tell whether this single value represents the true population mean.

Therefore the population parameter (mean) is represented by creating an interval estimation with given probability that such interval is expected to capture the true population parameter. (mean)

The confidence interval calculation requires knowledge of population standard deviation. This is almost never known, thus we must use *Student’s t-distribution* to calculate the population parameter using sample standard deviation.