https://d5nxst8fruw4z.cloudfront.net/atrk.gif?account=xQyRm1a4KM+2l9

Confidence Interval for Proportions:

Bayesian or Frequentist?


Home | Academic Articles


 

Suppose you want to construct a 95% confidence interval for a proportion. Should you go with the Bayesian school approach or that of the Frequentist school?

 

To illustrate, suppose a store stocks boxes of Brand X cereal each day so that there are 100 boxes in total. On day 1, they sell 62 boxes. What is the 95% confidence interval of the percentage of boxes sold per day?

 

Using the Frequentist school approach, the interval would be:

 

 

The width of the interval is 0.7151 – 0.5249 = 0.1902. Nice and simple.

 

Now, to use the Bayesian approach, we need a prior distribution for θ. Since we don’t have any prior distribution, we can use the uniform distribution in which P(θ) = 1.

 

However, the uniform distribution is a special case of the Beta distribution in which α = 1 and β = 1. As is well documented in the literature, the Beta distribution is the conjugate prior of the binomial distribution.

 

The formula for the Beta distribution is:

 

 

In the formula,  represents the gamma function of α. The formula for the gamma function is:

 

 

If α is a whole number, then  is equal to (α – 1)!. For example, if α = 3, then  = 2! = 2 x 1 = 2. It should be noted that in Excel, the factorial function is called fact. So, in an Excel spreadsheet, to call 2!, you would type in a cell =fact(2) to get the result of 2.

 

In our example with α = 1 and β = 1, we have:

 

 

This is due to the fact that 0! = 1 by definition.

 

To eventually construct the posterior distribution of θ, we can say:

 

 

The formula for the binomial distribution is:

 

 

In the formula, nCx is the number of ways to choose x items from n. Its formula is:

 

 

In order to construct the posterior distribution of θ, we can say:

 

 

In stats speak, f(x | θ) is called the likelihood.

 

To construct the posterior distribution of θ, we combine the prior and likelihood:

 

 

Thus, the posterior distribution of θ follows a Beta distribution with α = x + α and β = n + β – x.

 

In our example, α = 62 + 1 = 63 and β = 100 + 1 – 62 = 39. Note that the sum of α and β is 102. This is due to the sum of n = 100, the prior α = 1 and the prior β = 1.

 

To construct the 95% confidence interval using the Beta distribution, we need the 2.5th and 97.5th percentiles of the Beta distribution with α = 63 and β = 39. Using Excel, for the 2.5th percentile, I type in a cell =beta.inv(0.025,63,39) to get 0.5218 and for the 97.5th percentile, I type =beta.inv(0.975,63,39) to get 0.7091. Thus, the interval is:

 

 

The width of this interval is 0.7091 – 0.5218 = 0.1873 which is a tad tighter than the 0.1902 from the previous confidence interval.

 

Suppose that over the next 4 days, as on the first day, there are 100 boxes on the shelf and these are the number of boxes sold:

 

Day 2

Day 3

Day 4

Day 5

28

8

42

65

 

If we sum the number of sales over the 5 days, p = 205/500 = 0.41. Then the 95% confidence interval is:

 

 

The width of this interval is 0.4531 – 0.3669 = 0.0862.

 

Using the Beta distribution, we find the new values of α and β:

 

 

Prior

Day 2

Day 3

Day 4

Day 5

Total

α

63

28

8

42

65

206

β

39

72

92

58

35

296

 

Note that we use α = 63 and β = 39 as the new priors. Again, using Excel, the 2.5th percentile is 0.3677 and the 97.5th percentile is 0.4537. The width of this interval is 0.4537 – 0.3677 = 0.086. The width is still slightly less than that of 0.0862 but the difference is narrowing.

 

Suppose another 5 days is added and 226/500 boxes were sold. When we add this to the previous 205/500, we get 431/1000. The 95% confidence interval is:

 

 

The width of this interval is 0.4617 – 0.4003 = 0.0614.

 

Using the Beta distribution, α = 206 + 226 = 432 and β = 296 + 274 = 570. This time, the 2.5th percentile is 0.4006 and the 97.5th percentile is 0.4619. The width of this interval is 0.4619 – 0.4006 = 0.0613.

 

However, the Frequentist school has a tool at its disposal: Wilson’s estimate. In using this tool, the sample proportion used in the interval is:

 

 

Modifying the value of p in the last confidence interval:

 

 

The confidence interval becomes:

 

 

The two intervals are identical. We can conclude that the Bayesian school approach has no advantage in the case of a large number of successes but does in the case of a small number.

 

For example, if we look at day 3 in which there were 8 sales. Using Wilson’s estimate:

 

 

The 95% confidence interval is:

 

 

The width of this interval is 0.1529 – 0.0395 = 0.1134

 

Using the Beta distribution, if we assume no prior information, we have α = 9 and β = 93. The 2.5th percentile is 0.0416 and the 97.5th percentile is 0.1501. The width of this interval is 0.1501 – 0.0416 = 0.1085.

 

In the extreme case of no sales, using Wilson’s estimate:

 

 

The 95% confidence interval is:

 

 

This can be interpreted as the percentage of sales per day ranging from 0 to 4.56%.

 

Using the Beta distribution in which we assume no prior information, we have α = 1 and β = 101. The 2.5th percentile is 0.00025 and the 97.5th percentile is 0.0359. The width of this interval is 0.0359 – 0.00025 = 0.03565 or 3.565%.

 

This raises the question as to why Bayesian analysis is better when there are a small number of successes.

 

Let’s start with the mean and variance of the Beta distribution.

 

 

 

In the case of the posterior distribution of θ in which α = x + α and β = n + β – x, the mean and variance are:

 

 

 

Let’s examine the case in which the prior is the uniform distribution in which α = 1 and β = 1. The mean and variance become:

 

 

 

If we take x equal to zero, the mean and variance become:

 

 

 

If we examine the variance of θ using Wilson’s estimate, we have:

 

 

If we take x equal to zero, we have:

 

 

As is well documented, the variance (or standard deviation) has an effect on the width of a confidence interval: A larger variance results in a wider confidence interval.

 

This raises the question: For which values of n is the variance of the Beta distribution less than that using Wilson’s estimate?

 

 

The above inequality holds once n > 2. If x is increased to 1, then the variance of the Beta distribution is less once n > 3.

 

On the other hand, if we take the limit of either variance as n approaches infinity, both variances approach zero, thus eliminating the advantage of the Bayesian approach to that of the Frequentist approach, as illustrated in earlier examples.