Suppose you want to construct a 95% confidence
interval for a proportion. Should you go with the Bayesian school approach or
that of the Frequentist school?
To illustrate, suppose a store stocks boxes of
Brand X cereal each day so that there are 100 boxes in total. On day 1, they
sell 62 boxes. What is the 95% confidence interval of the percentage of boxes
sold per day?
Using the Frequentist school approach, the
interval would be:
The width of the interval is 0.7151 – 0.5249 =
0.1902. Nice and simple.
Now, to use the Bayesian approach, we need a
prior distribution for θ. Since we don’t have any prior distribution, we
can use the uniform distribution in which P(θ) = 1.
However, the uniform distribution is a special
case of the Beta distribution in which α = 1 and β = 1. As is well
documented in the literature, the Beta distribution is the conjugate prior of
the binomial distribution.
The formula for the Beta distribution is:
In the formula, represents the gamma function of α. The
formula for the gamma function is:
If α is a whole number, then is
equal to (α – 1)!. For example, if α = 3,
then = 2! =
2 x 1 = 2. It should be noted that in Excel, the factorial function is called
fact. So, in an Excel spreadsheet, to call 2!, you
would type in a cell =fact(2) to get the result of 2.
In our example with α = 1 and β = 1,
we have:
This is due to the fact that
0! = 1 by definition.
To eventually construct the posterior
distribution of θ, we can say:
The formula for the binomial distribution is:
In the formula, nCx
is the number of ways to choose x items from n. Its formula is:
In order to construct the posterior distribution of
θ, we can say:
In stats speak, f(x 
θ) is called the likelihood.
To construct the posterior distribution of
θ, we combine the prior and likelihood:
Thus, the posterior distribution of θ
follows a Beta distribution with α = x + α and β = n + β –
x.
In our example, α = 62 + 1 = 63 and
β = 100 + 1 – 62 = 39. Note that the sum of α and β is 102. This
is due to the sum of n = 100, the prior α = 1 and the prior β = 1.
To construct the 95% confidence interval using
the Beta distribution, we need the 2.5^{th} and 97.5^{th}
percentiles of the Beta distribution with α = 63 and β = 39. Using
Excel, for the 2.5^{th} percentile, I type in a cell =beta.inv(0.025,63,39)
to get 0.5218 and for the 97.5^{th} percentile, I type =beta.inv(0.975,63,39) to get 0.7091. Thus, the interval is:
The width of this interval is 0.7091 – 0.5218
= 0.1873 which is a tad tighter than the 0.1902 from the previous confidence
interval.
Suppose that over the next 4 days, as on the
first day, there are 100 boxes on the shelf and these are the number of boxes
sold:
Day 2 
Day 3 
Day 4 
Day 5 
28 
8 
42 
65 
If we sum the number of sales over the 5 days,
p = 205/500 = 0.41. Then the 95% confidence interval is:
The width of this interval is 0.4531 – 0.3669
= 0.0862.
Using the Beta distribution, we find the new
values of α and β:

Prior 
Day 2 
Day 3 
Day 4 
Day 5 
Total 
α 
63 
28 
8 
42 
65 
206 
β 
39 
72 
92 
58 
35 
296 
Note that we use α = 63 and β = 39
as the new priors. Again, using Excel, the 2.5^{th} percentile is
0.3677 and the 97.5^{th} percentile is 0.4537. The width of this
interval is 0.4537 – 0.3677 = 0.086. The width is still slightly less than that
of 0.0862 but the difference is narrowing.
Suppose another 5 days is added and 226/500
boxes were sold. When we add this to the previous 205/500, we get 431/1000. The
95% confidence interval is:
The width of this interval is 0.4617 – 0.4003
= 0.0614.
Using the Beta distribution, α = 206 +
226 = 432 and β = 296 + 274 = 570. This time, the 2.5^{th}
percentile is 0.4006 and the 97.5^{th} percentile is 0.4619. The width
of this interval is 0.4619 – 0.4006 = 0.0613.
However, the Frequentist school has a tool at
its disposal: Wilson’s estimate. In using this tool, the sample proportion used
in the interval is:
Modifying the value of p in the last
confidence interval:
The confidence interval becomes:
The two intervals are identical. We can
conclude that the Bayesian school approach has no advantage in the case of a large number of successes but does in the case of a small
number.
For example, if we look at day 3 in which
there were 8 sales. Using Wilson’s estimate:
The 95% confidence interval is:
The width of this interval is 0.1529 – 0.0395
= 0.1134
Using the Beta distribution, if we assume no
prior information, we have α = 9 and β = 93. The 2.5^{th}
percentile is 0.0416 and the 97.5^{th} percentile is 0.1501. The width
of this interval is 0.1501 – 0.0416 = 0.1085.
In the extreme case of no sales, using
Wilson’s estimate:
The 95% confidence interval is:
This can be interpreted as the percentage of
sales per day ranging from 0 to 4.56%.
Using the Beta distribution in which we assume
no prior information, we have α = 1 and β = 101. The 2.5^{th}
percentile is 0.00025 and the 97.5^{th} percentile is 0.0359. The width
of this interval is 0.0359 – 0.00025 = 0.03565 or 3.565%.
This raises the question as to why Bayesian
analysis is better when there are a small number of successes.
Let’s start with the mean and variance of the
Beta distribution.
In the case of the posterior distribution of
θ in which α = x + α and β = n + β – x, the mean and
variance are:
Let’s examine the case in which the prior is
the uniform distribution in which α = 1 and β = 1. The mean and
variance become:
If we take x equal to zero, the mean and
variance become:
If we examine the variance of θ using
Wilson’s estimate, we have:
If we take x equal to zero, we have:
As is well documented, the variance (or standard
deviation) has an effect on the width of a confidence interval: A larger
variance results in a wider confidence interval.
This raises the question: For which values of
n is the variance of the Beta distribution less than that using Wilson’s estimate?
The above inequality holds once n > 2. If x
is increased to 1, then the variance of the Beta distribution is less once n
> 3.
On the other hand, if we take the limit of either
variance as n approaches infinity, both variances approach zero, thus
eliminating the advantage of the Bayesian approach to that of the Frequentist
approach, as illustrated in earlier examples.