14 2distributionofp

# Distribution of Sample Proportion

.

In the last section we introduced the idea of a sample proportion.
.

Recall that the data is binomial, meaning each data point is "success" or "fail"
.

The sample proportion is the fraction of the sample which scores a success on the question being studied.
.

• We use $\hat{p}$ to represent the sample proportion
• $\hat{p}$ is a sample statistic so it varies from sample to sample
• $\hat{p} = \dfrac{\text{num of successes in sample}}{\text{sample size (n)}}$

.

The usual process in statistics is to select one sample from the population and draw a conclusion about the population from the sample.
.

In this section, we will collect a significant number of samples from the same population (returning the sample to the population each time)
.

The sample proportion then behaves like a binomial distribution.
.

### Caution:

• When we studied binomial distributions, we used n = the number of trials
• In this topic, we use n = sample size
• The two meanings are related: in one sample we are effectively performing n trials of a binomial variable

.

### Example 1

It is known that 12% of students in a school of 1500 students are left handed

• the population proportion $p = 0.12$
• the population size $N = 1500$

.

We will use a sample size of $n = 20$ students
.

Let X be the variable which is the number of left handed students in each sample.
.

… … $\hat{p} = \dfrac{X}{n}$
.

We took 50 samples with $n = 20$ and produced the following frequency table (modelled using random numbers)

In other words:

• there were 5 samples with 0 left-handed students,
• 10 samples with 1 left handed student, etc

there were no samples with $X > 6$ (more than 6 left-handed students)
.

We can find the mean and standard deviation of this set of data:
.

… … Mean

… … … … $\mu = \dfrac{\Sigma \hat{p} \times f}{\Sigma f}$

… … … … … $= \Sigma \big( \hat{p} \times RF \big)$ … … {RF is Relative Frequency}

… … … … … $= 0.12$
.

… … Variance

… … … … $\sigma^2 = E \big( \hat{p}^2 \big) - \mu^2$

… … … … … $= 0.0055$
.

… … Standard Deviation

… … … … $\sigma = \sqrt{0.0055}$

… … … … … $= 0.0742$

.

Despite having modelled this with random numbers, the mean sample proportion worked out to be exactly 0.12 which is the same as the population proportion.

.

## Expected Value and Standard Deviation of Sample Proportion

Larger samples give better estimates of the population proportion, p.
.

If the sample is sufficiently large, then

• the distribution of X, the number of successes, can be treated as a binomial variable
• the distribution of $\hat{p}$ can therefore also be treated as a binomial variable

.

• We know that the sample proportion: $\hat{p} = \dfrac{x}{n}$

.

• For a large sample, the random variable: $\hat{P} = \dfrac{X}{n}$

.

Therefore:

… … $\text{E} \big( \hat{P} \big) = \text{E} \Big( \dfrac{X}{n} \Big)$
.

… … … $= \dfrac{1}{n} \text{E} \big( X \big)$
.

… … … $= \dfrac{1}{n} \times np$
.

… … … $= p$
.

$\text{E} \big( \hat{P} \big) = p$ means that the expected average over a lot of samples of $\hat{p}$ will be the population proportion, p

.

Also

… … $\text{Var} \big( \hat{P} \big) = \text{Var} \Big( \dfrac{X}{n} \Big)$
.

… … … … $= \Big( \dfrac{1}{n} \Big)^2 \text{Var} \big( X \big)$
.

… … … … $= \dfrac{1}{n^2} \times np(1-p)$
.

… … … … $= \dfrac{p(1-p)}{n}$
.

hence

… … $\text{SD} \big( \hat{P} \big) = \sqrt{ \dfrac{ p(1-p) }{n} }$

.

### Example 1b

In the example above of left-handed students, where $p = 0.12, \; n = 20$ we get theoretical results of:
.

… … $\text{E} \big( \hat{P} \big) = 0.12$
.

… … $\text{SD} = \sqrt{ \dfrac{ 0.12(1 - 0.12) }{20} } = 0.0727$

.

Compare these values with the experimental results obtained from 50 samples

… … $\mu = 0.12 \qquad \qquad \sigma = 0.0742$

.

## Large Samples

.

The above theory works best when sufficiently large samples are taken.
.

One definition of a large sample is that it fits the following 3 rules:

… … $np \geqslant 10$

… … $n(1 - p) \geqslant 10$

… … $10n \geqslant N$
.

### Example 1c

Consider the example above where we took samples of 20 students from a school of 1500 to test for left-handedness $\big(p = 0.12\big)$.

Is this sample sufficiently large? If not, how large should the sample be?
.

Solution:

… … Compare n, p, N to the three rules listed above.
.

… … … $np = 20 \times 0.12 = 2.4$ … … which is NOT $\geqslant 10$
.

… … … $n(1- p) = 20(1 - 0.12) = 17.6$ … … which is $\geqslant 10$
.

… … … $10n = 10 \times 20 = 200$ … … which is NOT $\geqslant 1500$
.

… … Hence $n = 20$ was not sufficiently large according to this set of rules.

.

… … To find how large to make the sample, we need n such that $np > 10$
.

… … … $n \times 0.12 \geqslant 10$
.

… … … $n \geqslant 83.3$
.

… … … Round $83.3$ up to the next integer gives $n = 84$
.

…. … … Check $n = 84$ against all 3 rules:
.

… … … … $np = 84 \times 0.12 = 10.08$ … … $10.08 \geqslant 10$
.

… … … … $n(1 - p) = 84(1 - 0.12) = 73.92$ … … $73.92 \geqslant 10$
.

… … … … $10n = 10 \times 84 = 840$ … … $840 \text{ is NOT } \geqslant 1500$
.

… … … So $n = 84$ is still not big enough to meet the third rule.
.

… … … Try $n = 150$ obtained from the third rule.
.

… … … … $np = 150 \times 0.12 = 18$ … … $18 \geqslant 10$
.

… … … … $n(1 - p) = 150(1 - 0.12) = 132$ … … $132 \geqslant 10$
.

… … … … $10n = 10 \times 150 = 1500$ … … $1500 \geqslant 1500$

… … $n = 150$ meets all 3 rules, hence $n = 150$ is a sufficiently large sample

.

## Theoretical Distribution of Sample Proportion

.

When we know the population size (N) and the population proportion (p) we can perform the following calculations.
.

The total number of ways a sample of n can be selected from a population of N is given by $^NC_n$.
.

If the population proportion is p, then the number of successes in the population is $N \times p$
.

and the number of fails in the population is $N(1 - p)$
.

If x is the number of successes in a sample of size n, there will be $(n - x)$ fails.
.

Therefore, the total number of ways we can get:

… … x successes out of Np possible successes

and … $(n – x)$ fails out of $N(1 – p)$ possible fails
.

is given by: … $^{Np}C_x \times ^{N(1-p)}C_{n-x}$

.

### Example 2

A large tub contains 20 pieces of fruit of which 6 are apples.

If we consider selecting an apple as a success then $p = \dfrac{6}{20} = 0.3$

If we take a number of random samples where $n = 5$

Let X = number of apples in one sample.

a) .. construct a table of the possible number of samples for each value of $X = x$, together with the relative frequencies.

b) .. construct a table of the sampling distribution

c) .. calculate the Theoretical Expected Value and Standard Deviation for the sample proportion, $\hat{p}$.

.

Solution

a) .. construct a table of the possible number of samples for each value of $X = x$, together with the relative frequencies.

… … The total number of possible samples is $^{20}C_5 = 15504$

… … $n = 5$, so in each sample, the number of apples we could get is $\big\{0,\; 1,\; 2,\; 3,\; 4,\; 5 \big\}$.

.

b) .. construct a table of the sampling distribution

… … The sampling distribution is the probability distribution for the sample proportion.

… … Notice that the relative frequency from the above table becomes the Probability

.

c) .. calculate the Theoretical Expected Value and Standard Deviation for the sample proportion, $\hat{p}$.

… … $\text{E} \big( \hat{P} \big) = p = 0.3$
.

… … $\text{SD} \big( \hat{P} \big) = \sqrt{ \dfrac{ p(1-p) }{n} } = 0.2049$

.

## Approximation to Normal Distribution

If we take enough large samples from a population, the distribution of the sample proportion will approximate a normal distribution.

For example, the histogram below was produced using 1000 samples of size $n = 100$ using a random number generator.

.

In the next section, Confidence Intervals, we will treat the sample proportion as a normal distribution

.