14 2distributionofp

Distribution of Sample Proportion

In the last section we introduced the idea of a sample proportion.
.

Recall that the data is binomial, meaning each data point is "success" or "fail"
.

The sample proportion is the fraction of the sample which scores a success on the question being studied.
.

We use $\hat{p}$ to represent the sample proportion

$\hat{p}$ is a sample statistic so it varies from sample to sample

$\hat{p} = \dfrac{\text{num of successes in sample}}{\text{sample size (n)}}$

The usual process in statistics is to select one sample from the population and draw a conclusion about the population from the sample.
.

In this section, we will collect a significant number of samples from the same population (returning the sample to the population each time)
.

The sample proportion then behaves like a binomial distribution.
.

Caution:

When we studied binomial distributions, we used n = the number of trials

In this topic, we use n = sample size

The two meanings are related: in one sample we are effectively performing n trials of a binomial variable

Example 1

It is known that 12% of students in a school of 1500 students are left handed

the population proportion $$p = 0.12$$

the population size $$N = 1500$$

We will use a sample size of $$n = 20$$ students
.

Let X be the variable which is the number of left handed students in each sample.
.

… … $\hat{p} = \dfrac{X}{n}$
.

We took 50 samples with $$n = 20$$ and produced the following frequency table (modelled using random numbers)

In other words:

there were 5 samples with 0 left-handed students,

10 samples with 1 left handed student, etc

there were no samples with $$X > 6$$ (more than 6 left-handed students)
.

We can find the mean and standard deviation of this set of data:
.

… … Mean

… … … … $\mu = \dfrac{\Sigma \hat{p} \times f}{\Sigma f}$

… … … … … $= \Sigma \big( \hat{p} \times RF \big)$ … … {RF is Relative Frequency}

… … … … … $$= 0.12$$
.

… … Variance

… … … … $\sigma^2 = E \big( \hat{p}^2 \big) - \mu^2$

… … … … … $$= 0.0055$$
.

… … Standard Deviation

… … … … $\sigma = \sqrt{0.0055}$

… … … … … $$= 0.0742$$

Despite having modelled this with random numbers, the mean sample proportion worked out to be exactly 0.12 which is the same as the population proportion.

Expected Value and Standard Deviation of Sample Proportion

Larger samples give better estimates of the population proportion, p.
.

If the sample is sufficiently large, then

the distribution of X, the number of successes, can be treated as a binomial variable

the distribution of $\hat{p}$ can therefore also be treated as a binomial variable

We know that the sample proportion: $\hat{p} = \dfrac{x}{n}$

For a large sample, the random variable: $\hat{P} = \dfrac{X}{n}$

Therefore:

… … $\text{E} \big( \hat{P} \big) = \text{E} \Big( \dfrac{X}{n} \Big)$
.

… … … $= \dfrac{1}{n} \text{E} \big( X \big)$
.

… … … $= \dfrac{1}{n} \times np$
.

… … … $$= p$$
.

$\text{E} \big( \hat{P} \big) = p$ means that the expected average over a lot of samples of $\hat{p}$ will be the population proportion, p

Also

… … $\text{Var} \big( \hat{P} \big) = \text{Var} \Big( \dfrac{X}{n} \Big)$
.

… … … … $= \Big( \dfrac{1}{n} \Big)^2 \text{Var} \big( X \big)$
.

… … … … $= \dfrac{1}{n^2} \times np(1-p)$
.

… … … … $= \dfrac{p(1-p)}{n}$
.

hence

… … $\text{SD} \big( \hat{P} \big) = \sqrt{ \dfrac{ p(1-p) }{n} }$

Example 1b

In the example above of left-handed students, where $p = 0.12, \; n = 20$ we get theoretical results of:
.

… … $\text{E} \big( \hat{P} \big) = 0.12$
.

… … $\text{SD} = \sqrt{ \dfrac{ 0.12(1 - 0.12) }{20} } = 0.0727$

Compare these values with the experimental results obtained from 50 samples

… … $\mu = 0.12 \qquad \qquad \sigma = 0.0742$

Large Samples

The above theory works best when sufficiently large samples are taken.
.

One definition of a large sample is that it fits the following 3 rules:

… … $np \geqslant 10$

… … $n(1 - p) \geqslant 10$

… … $10n \geqslant N$
.

Example 1c

Consider the example above where we took samples of 20 students from a school of 1500 to test for left-handedness $\big(p = 0.12\big)$ .

Is this sample sufficiently large? If not, how large should the sample be?
.

Solution:

… … Compare n, p, N to the three rules listed above.
.

… … … $np = 20 \times 0.12 = 2.4$ … … which is NOT $\geqslant 10$
.

… … … $$n(1- p) = 20(1 - 0.12) = 17.6$$ … … which is $\geqslant 10$
.

… … … $10n = 10 \times 20 = 200$ … … which is NOT $\geqslant 1500$
.

… … Hence $$n = 20$$ was not sufficiently large according to this set of rules.

… … To find how large to make the sample, we need n such that $$np > 10$$
.

… … … $n \times 0.12 \geqslant 10$
.

… … … $n \geqslant 83.3$
.

… … … Round $$83.3$$ up to the next integer gives $$n = 84$$
.

…. … … Check $$n = 84$$ against all 3 rules:
.

… … … … $np = 84 \times 0.12 = 10.08$ … … $10.08 \geqslant 10$
.

… … … … $$n(1 - p) = 84(1 - 0.12) = 73.92$$ … … $73.92 \geqslant 10$
.

… … … … $10n = 10 \times 84 = 840$ … … $840 \text{ is NOT } \geqslant 1500$
.

… … … So $$n = 84$$ is still not big enough to meet the third rule.
.

… … … Try $$n = 150$$ obtained from the third rule.
.

… … … … $np = 150 \times 0.12 = 18$ … … $18 \geqslant 10$
.

… … … … $$n(1 - p) = 150(1 - 0.12) = 132$$ … … $132 \geqslant 10$
.

… … … … $10n = 10 \times 150 = 1500$ … … $1500 \geqslant 1500$

… … $$n = 150$$ meets all 3 rules, hence $$n = 150$$ is a sufficiently large sample

Theoretical Distribution of Sample Proportion

When we know the population size (N) and the population proportion (p) we can perform the following calculations.
.

The total number of ways a sample of n can be selected from a population of N is given by $$^NC_n$$ .
.

If the population proportion is p, then the number of successes in the population is $N \times p$
.

and the number of fails in the population is $$N(1 - p)$$
.

If x is the number of successes in a sample of size n, there will be $$(n - x)$$ fails.
.

Therefore, the total number of ways we can get:

… … x successes out of Np possible successes

and … $$(n - x)$$ fails out of $$N(1 - p)$$ possible fails
.

is given by: … $^{Np}C_x \times ^{N(1-p)}C_{n-x}$

Example 2

A large tub contains 20 pieces of fruit of which 6 are apples.

If we consider selecting an apple as a success then $p = \dfrac{6}{20} = 0.3$

If we take a number of random samples where $$n = 5$$

Let X = number of apples in one sample.

… a) .. construct a table of the possible number of samples for each value of $$X = x$$ , together with the relative frequencies.

… b) .. construct a table of the sampling distribution

… c) .. calculate the Theoretical Expected Value and Standard Deviation for the sample proportion, $\hat{p}$ .

Solution

… a) .. construct a table of the possible number of samples for each value of $$X = x$$ , together with the relative frequencies.

… … The total number of possible samples is $^{20}C_5 = 15504$

… … $$n = 5$$ , so in each sample, the number of apples we could get is $\big\{0,\; 1,\; 2,\; 3,\; 4,\; 5 \big\}$ .

… b) .. construct a table of the sampling distribution

… … The sampling distribution is the probability distribution for the sample proportion.

… … Notice that the relative frequency from the above table becomes the Probability

… c) .. calculate the Theoretical Expected Value and Standard Deviation for the sample proportion, $\hat{p}$ .

… … $\text{E} \big( \hat{P} \big) = p = 0.3$
.

… … $\text{SD} \big( \hat{P} \big) = \sqrt{ \dfrac{ p(1-p) }{n} } = 0.2049$

Approximation to Normal Distribution

If we take enough large samples from a population, the distribution of the sample proportion will approximate a normal distribution.

For example, the histogram below was produced using 1000 samples of size $$n = 100$$ using a random number generator.

In the next section, Confidence Intervals, we will treat the sample proportion as a normal distribution

BHS Methods34

Notes for VCE Maths Methods Units 3&4

Site Navigation

Create a Page