Distribution of Sample Proportion
.
In the last section we introduced the idea of a sample proportion.
.
Recall that the data is binomial, meaning each data point is "success" or "fail"
.
The sample proportion is the fraction of the sample which scores a success on the question being studied.
.
- We use $\hat{p}$ to represent the sample proportion
- $\hat{p}$ is a sample statistic so it varies from sample to sample
- $\hat{p} = \dfrac{\text{num of successes in sample}}{\text{sample size (n)}}$
.
The usual process in statistics is to select one sample from the population and draw a conclusion about the population from the sample.
.
In this section, we will collect a significant number of samples from the same population (returning the sample to the population each time)
.
The sample proportion then behaves like a binomial distribution.
.
Caution:
- When we studied binomial distributions, we used n = the number of trials
- In this topic, we use n = sample size
- The two meanings are related: in one sample we are effectively performing n trials of a binomial variable
.
Example 1
It is known that 12% of students in a school of 1500 students are left handed
- the population proportion $p = 0.12$
- the population size $N = 1500$
.
We will use a sample size of $n = 20$ students
.
Let X be the variable which is the number of left handed students in each sample.
.
… … $\hat{p} = \dfrac{X}{n}$
.
We took 50 samples with $n = 20$ and produced the following frequency table (modelled using random numbers)
In other words:
- there were 5 samples with 0 left-handed students,
- 10 samples with 1 left handed student, etc
there were no samples with $X > 6$ (more than 6 left-handed students)
.
We can find the mean and standard deviation of this set of data:
.
… … Mean
… … … … $\mu = \dfrac{\Sigma \hat{p} \times f}{\Sigma f}$
… … … … … $= \Sigma \big( \hat{p} \times RF \big)$ … … {RF is Relative Frequency}
… … … … … $= 0.12$
.
… … Variance
… … … … $\sigma^2 = E \big( \hat{p}^2 \big) - \mu^2$
… … … … … $= 0.0055$
.
… … Standard Deviation
… … … … $\sigma = \sqrt{0.0055}$
… … … … … $= 0.0742$
.
Despite having modelled this with random numbers, the mean sample proportion worked out to be exactly 0.12 which is the same as the population proportion.
.
Expected Value and Standard Deviation of Sample Proportion
Larger samples give better estimates of the population proportion, p.
.
If the sample is sufficiently large, then
- the distribution of X, the number of successes, can be treated as a binomial variable
- the distribution of $\hat{p}$ can therefore also be treated as a binomial variable
.
- We know that the sample proportion: $\hat{p} = \dfrac{x}{n}$
.
- For a large sample, the random variable: $\hat{P} = \dfrac{X}{n}$
.
Therefore:
… … $\text{E} \big( \hat{P} \big) = \text{E} \Big( \dfrac{X}{n} \Big)$
.
… … … $= \dfrac{1}{n} \text{E} \big( X \big)$
.
… … … $= \dfrac{1}{n} \times np$
.
… … … $= p$
.
$\text{E} \big( \hat{P} \big) = p$ means that the expected average over a lot of samples of $\hat{p}$ will be the population proportion, p
.
Also
… … $\text{Var} \big( \hat{P} \big) = \text{Var} \Big( \dfrac{X}{n} \Big)$
.
… … … … $= \Big( \dfrac{1}{n} \Big)^2 \text{Var} \big( X \big)$
.
… … … … $= \dfrac{1}{n^2} \times np(1-p)$
.
… … … … $= \dfrac{p(1-p)}{n}$
.
hence
… … $\text{SD} \big( \hat{P} \big) = \sqrt{ \dfrac{ p(1-p) }{n} }$
.
Example 1b
In the example above of left-handed students, where $p = 0.12, \; n = 20$ we get theoretical results of:
.
… … $\text{E} \big( \hat{P} \big) = 0.12$
.
… … $\text{SD} = \sqrt{ \dfrac{ 0.12(1 - 0.12) }{20} } = 0.0727$
.
Compare these values with the experimental results obtained from 50 samples
… … $\mu = 0.12 \qquad \qquad \sigma = 0.0742$
.
Large Samples
.
The above theory works best when sufficiently large samples are taken.
.
One definition of a large sample is that it fits the following 3 rules:
… … $np \geqslant 10$
… … $n(1 - p) \geqslant 10$
… … $10n \geqslant N$
.
Example 1c
Consider the example above where we took samples of 20 students from a school of 1500 to test for left-handedness $\big(p = 0.12\big)$.
Is this sample sufficiently large? If not, how large should the sample be?
.
Solution:
… … Compare n, p, N to the three rules listed above.
.
… … … $np = 20 \times 0.12 = 2.4$ … … which is NOT $\geqslant 10$
.
… … … $n(1- p) = 20(1 - 0.12) = 17.6$ … … which is $\geqslant 10$
.
… … … $10n = 10 \times 20 = 200$ … … which is NOT $\geqslant 1500$
.
… … Hence $n = 20$ was not sufficiently large according to this set of rules.
.
… … To find how large to make the sample, we need n such that $np > 10$
.
… … … $n \times 0.12 \geqslant 10$
.
… … … $n \geqslant 83.3$
.
… … … Round $83.3$ up to the next integer gives $n = 84$
.
…. … … Check $n = 84$ against all 3 rules:
.
… … … … $np = 84 \times 0.12 = 10.08$ … … $10.08 \geqslant 10$
.
… … … … $n(1 - p) = 84(1 - 0.12) = 73.92$ … … $73.92 \geqslant 10$
.
… … … … $10n = 10 \times 84 = 840$ … … $840 \text{ is NOT } \geqslant 1500$
.
… … … So $n = 84$ is still not big enough to meet the third rule.
.
… … … Try $n = 150$ obtained from the third rule.
.
… … … … $np = 150 \times 0.12 = 18$ … … $18 \geqslant 10$
.
… … … … $n(1 - p) = 150(1 - 0.12) = 132$ … … $132 \geqslant 10$
.
… … … … $10n = 10 \times 150 = 1500$ … … $1500 \geqslant 1500$
… … $n = 150$ meets all 3 rules, hence $n = 150$ is a sufficiently large sample
.
Theoretical Distribution of Sample Proportion
.
When we know the population size (N) and the population proportion (p) we can perform the following calculations.
.
The total number of ways a sample of n can be selected from a population of N is given by $^NC_n$.
.
If the population proportion is p, then the number of successes in the population is $N \times p$
.
and the number of fails in the population is $N(1 - p)$
.
If x is the number of successes in a sample of size n, there will be $(n - x)$ fails.
.
Therefore, the total number of ways we can get:
… … x successes out of Np possible successes
and … $(n – x)$ fails out of $N(1 – p)$ possible fails
.
is given by: … $^{Np}C_x \times ^{N(1-p)}C_{n-x}$
.
Example 2
A large tub contains 20 pieces of fruit of which 6 are apples.
If we consider selecting an apple as a success then $p = \dfrac{6}{20} = 0.3$
If we take a number of random samples where $n = 5$
Let X = number of apples in one sample.
… a) .. construct a table of the possible number of samples for each value of $X = x$, together with the relative frequencies.
… b) .. construct a table of the sampling distribution
… c) .. calculate the Theoretical Expected Value and Standard Deviation for the sample proportion, $\hat{p}$.
.
Solution
… a) .. construct a table of the possible number of samples for each value of $X = x$, together with the relative frequencies.
… … The total number of possible samples is $^{20}C_5 = 15504$
… … $n = 5$, so in each sample, the number of apples we could get is $\big\{0,\; 1,\; 2,\; 3,\; 4,\; 5 \big\}$.
.
… b) .. construct a table of the sampling distribution
… … The sampling distribution is the probability distribution for the sample proportion.
… … Notice that the relative frequency from the above table becomes the Probability
.
… c) .. calculate the Theoretical Expected Value and Standard Deviation for the sample proportion, $\hat{p}$.
… … $\text{E} \big( \hat{P} \big) = p = 0.3$
.
… … $\text{SD} \big( \hat{P} \big) = \sqrt{ \dfrac{ p(1-p) }{n} } = 0.2049$
.
Approximation to Normal Distribution
If we take enough large samples from a population, the distribution of the sample proportion will approximate a normal distribution.
For example, the histogram below was produced using 1000 samples of size $n = 100$ using a random number generator.
.
In the next section, Confidence Intervals, we will treat the sample proportion as a normal distribution
.