Search This Blog

Wednesday, May 1, 2019

MAT 209 - Statistics Chapter 7 - Normal Distribution

THE FOLLOWING NOTES USE EXTENSIVE MS Word Math Equations that will NOT transfer over to HTML.

Chapter 7 Normal Distribution

Continuous sample space and continuous random variable occur when dealing with quantities that are measured on continuous scale.
Normal distribution ~ normal distribution curves à AKA bell-shaped distribution, BUT NOT all bell-shaped distributions are normal distributions (A à B, BUT NOT B à A)

7.1 Continuous Distributions

Previously, frequencies, %’s/probabilities represented by heights of rectangles and IF class intervals are all equal frequencies, %’s/probabilities also represented by areas of rectangles.

For continuous curves – probabilities represented by areas under the curve.
 Continuous curves = graphs of functions that are referred to as probability densities/continuous distribution.
è The area under the curve between any two values a and b gives the probability that a random variable having this continuous distribution will take on a value on the interval from a to b.
è Area under the curve between a and b =
è Values of probability density should NOT be negative.
è Total area under the curve is always equal to 1.
Based on the math of area under a curve (for the image above) à The probability of a specific value is = 0 b/c for a specific x value with a y range, the area under the curve (as calculated using rectangles) is (length*width) à length = y and width = (x-x) = 0 à (y*0) = 0 à Paradox since specific values must have a probability.

Continuous distributions approximated by histograms of probability distributions.
Histograms with narrower and narrower classes à mean and standard deviation of probability distribution will approach continuous distribution.

Continuous distribution mean (µ) = measure of center or middle
Continuous distribution standard deviation (σ) = measure of dispersion or spread

7.2 Normal Distribution

Cornerstone of modern statistics – role in development of statistical theory and its alignment with observed data
* Area under the curve becomes negligible greater than 4 or 5 standard deviations away.
There is one and only one normal distribution with a given mean µ and a given standard deviation σ. à increase/decrease the mean = move the curve right(+) or left(-), increase/decrease the standard deviation = flatten(+) or sharpen(-) the curve.

Standard normal distribution = normal-curve areas for µ = 0 and σ = 1.
Can obtain areas under any normal curve by changing scale and converting into standard units with formula:
It’s the good o’ z score repurposed here à tells how many standard deviations from the mean.
Assuming a standard normal distribution, z scores can be used to find area under curve with ease using a table with values. The z scores of the limits of the interval region are used to obtain the probability, which are then subtracted to obtain the probability of the region.
The z scores for a normal distribution also follow the 1σ=68%, 2σ=95%, 3σ=99.7% rule.

Using R:
Rnorm = Generates random #s from normal distribution à rnorm(n,mean,sd,log=F/T)
Dnorm = Probability Density Function à dnorm(x,mean,sd, lower.tail=T/F, log.p=F/T
Pnorm = Cumulative Distribution Function à pnorm(x,mean,sd, lower.tail=T/F, log.p=F/T)
Qnorm = Quantile Function – inverse of pnorm à qnorm(p,mean, sd)

Pnorm(x, mean, sd, lower.tail (T/F), log.p(F/T)) – gives probability of –∞ < x < z score (everything less than the x value).
ALT: pnorm (z-score) Default: mean=0,sd=1

Example for curve depicted in subsection 7.1: pnorm(x of red line,mean=0,sd=1) = Total Probability left of red line (everything under curve left of red line)

Working backward from percentage probability to z score?
Use qnorm(probability,mean=0,sd=1) = z score

7.3 Some Applications

Example 7.6 Amount of cosmic radiation to which a person is exposed while flying by jet across the United States is a random variable having a normal distribution with µ = 4.35mrem and σ = 0.59mrem. Find the probabilities that a person on such as flight will be exposed to (a) more than 5.00mrem of cosmic radiation, (b) anywhere from 3.00 to 4.00mrem of cosmic radiation.
(a) µ = 4.35mrem     σ = 0.59mrem          x = 5   z = x - µ / σ = (5.00 – 4.35)/0.59 = 1.101695
Using R: pnorm(1.101695) = 0.8647029 à 0.8647029-0.5 = 0.3647029 (right portion curve past mean and under the curve)
Since we are looking for values more than 5.00mrem à 0.5 – 0.3647029 = 0.1352971

Alternative (faster) solution: 1-pnorm(1.101695) = 0.1352972 ~ 0.1352971

(b) µ = 4.35mrem     σ = 0.59mrem          x = 3.00,4.00              z = x - µ / σ
z3.00 = (3.00-4.35)/0.59 = -2.288136         z4.00 = (4.00-4.35)/0.59 = -0.5932203
Using R: pnorm(-2.288136) = 0.01106481                      pnorm(-0.5932203) = 0.2765169
Pnorm(-0.5932203) – pnorm(-2.288136) = 0.2765169 – 0.01106481 = 0.2654521
* Remember that the pnorm function in R uses –∞ as a base limit (calculates from left area).

Example 7.7 Actual amount of instant coffee that a machine puts into “4oz” jars varies from jar to jar and it may be looked upon as a random variable having a normal distribution with σ = 0.04oz. If only 2% of the jars are to contain less than 4oz of coffee, what must be the mean fill of these jar?
σ = 0.04oz      pnorm(z) =0.02        z = x - µ / σ
Using R: To work backwards from the 2% to the z score à qnorm(0.02) = -2.053749
-2.053749 = 4.0 - µ/0.04 à µ = (-2.053749*0.04)-4 = -µ à -4.08215 = -µ à µ = 4.08215

Normal distribution is a continuous distribution that applies to continuous random variables, it can be approximated to find distributions of a finite number of values.
Normal distribution ~ finite #s distribution when continuity correction applied.

*** Continuity correction is NOT used as much in the present due to the power of computers.

Continuity correction = The adjustment made when working with the normal distribution as an approximation to the binomial distribution. Adjust statements for rounding.
< (less than, under, below, fewer than) – refers to 0.5 less than the target number (<15 à <14.5)
≤ (at most, maximum, bottom) – refers to 0.5 more than target (≤15 à ≤ 15.5)
> (more than, above, over, greater than) – refers to 0.5 more than target (>15 à >15.5)
≥ (at least, minimum, top) – refers to 0.5 less than the target (≥15 à ≥14.5)
= (equal to, exactly, half) – refers to range of 0.5±target (=15 à 14.5 < x < 15.5)

Example 7.8 Study of aggressive behavior, male white mice, returned to the group in which they live after 4 weeks of isolation, averaged 18.6 fights in 1st 5 minutes with standard deviation of 3.3 fights. If assumed that the distribution of this random variable (# of fights under stated conditions) can be approximated closely with normal distribution, what is the probability that such a mouse will get into at least 15 fights in the first 5 minutes?
µ = 18.6 fights           σ = 3.3 fights x = 15 à at least 15 à x 14.5     z = (x - µ)/σ
z = (14.5 – 18.6)/3.3 = -1.242424
Using R: pnorm(-1.242424) = 0.1070401 à at least 15 fights so looking for right half of curve (0.5) + left 1.242424 standard deviations à 1 – 0.1070401 = 0.8929599 ~ 0.89

*When value of a random variable is observed to have a normal distribution = sampling a normal population.

7.4 Normal Approximation to the Binomial Distribution

Normal distribution ~ binomial distribution WHEN:
- n (# of trials) is large                                             (np > 5) and (n(1-p)>5)
- p (probability of success) is ~ 0.5 (½)   

Normal distribution w/ µ = np and σ = (np(1-p))½ ~ binomial distributions (b/c same form as ↑) can be used to approximate, even when n is fairly small and p differs from 0.5.

Normal curve approximation to the binomial distribution is useful in problems where we would have to use the formula for the binomial distribution repeatedly to obtain the values of many different terms.

Example 7.11 What is the probability that at least 26 to 50 mosquitos will be killed by a new insect spray when the probability is 0.60 that any one of them will be killed by the spray?
Using the binomial distribution calculation:

Using R: sum(dbinom(26:50,50,0.6)) = 0.9021926 ≈ 0.902

Normal curve approximation
b/c (np> 5), (n(1-p)>5); (50*0.6) = 30>5, (50*0.4) = 20>5
At least 26
à continuity correction à 25.5
µ = np = (50)(0.6) = 30       σ = (np(1-p))½ = (30*(0.4))½ = 3.46102
Z = x - µ / σ = (25.5 – 30)/3.46102 = -1.300195           
Using R: pnorm(-1.300195) = 0.09676712 (This is the probability below the curve left of the 25.5 value)
To get the probabilities for values greater than 25.5
à 1-0.09676712 = 0.9032329

We now have 4 ways of determining binomial probabilities:
1.     Computer programs or printouts
2.     Formula for binomial distribution
3.     Poisson approximation of binomial distribution
Normal approximation of binomial distribution

No comments:

Post a Comment