Search This Blog

Monday, May 20, 2019

MAT 209 - Statistics Chapter 9 - Problems of Estimation

THE FOLLOWING NOTES USE EXTENSIVE MS Word Math Equations that will NOT transfer over to HTML.

Chapter 9 Problems of Estimation

Statistical inference – make generalizations based on samples (two types):
·      Problems of estimation: Assign a numerical value to a population parameter hoping that the sample estimate will be close OR assign an interval of values based on sample hoping that it will contain the population parameter in question.
·      Tests of hypotheses: Accept/reject assumptions about parameters/shape/form of population hoping that what we do will not be in error.

9.1 Estimation of Means

Often working from sample mean x̄ to determine population average µ.
Point estimate: Use of sample data to calculate a single specific value of an unknown population parameter à Problem: Lacks information describing its basis + no mention of error

Fluctuations of the sample mean from the true mean of the population depends on two factors as discussed in Chapter 8:
1.     Size of sample
2.     Size of population standard deviation σ

Scientific reports often present sample means (), n = # and s = sample standard deviation.

Example 7.5 If Zα denotes the value of z for which the area under the standard normal curve to its right is = to α, similar to log terminology. What is Z0.01 and Z0.05?
Z0.01 à 1-pnorm(z) = 0.01 à 1-0.01 = pnorm(z) à 0.99 = pnorm(z) à qnorm(0.99) = z = 2.326348 ≈ 2.33
Z0.05 à 1-pnorm(z) = 0.05 à 1-0.05 = pnorm(z) à 0.95 = pnorm(z) à qnorm(0.95) = 1.644854 ≈ 1.645
Alternative: Zβ = value of z, area under curve to left is β.
Z0.99 à pnorm(z) =0.99 à qnorm(0.99) = z = 2.326348
Z0.95 à pnorm(z) = 0.95 à qnorm(0.95) = z = 1.644854

Example 7.17 Based on above, Zα/2 denotes value for z for which area under the standard normal curve to its right is equal to α/2 à Area between -Zα/2 and Zα/2 is equal to 1 – α.
(a) Z0.025 = 1.96 with α = 0.05 à area between z = ±1.959964 is 1 – 0.05 = 0.95
pnorm(1.959964) – pnorm(-1.959964) = 0.9496102 ≈ 0.95

Making use of µ = µ and σ = σ/(n)½ and properties n > 30 with the population infinite OR finite & large à
When we use as an an estimate of µ, the probability is 1 – α that this estimate will be “off” by either way by at most
E = (Zα /2) * σ/(n)½
Where E (or ME) = the maximum possible error (potential error) from the mean.

Two values commonly used for 1 – α are z0.025 = 1.96 à α = 0.025*2 = 0.05 à 1 – α = 0.95 and z0.005 = 2.575 à α = 0.005*2 = 0.01 à 1 – α = 0.99

Example 9.1 Supervisor of garment manufacturing firm intends to use mean time required to sew a sample of 150 women’s coats to estimate the average time required to sew all such women’s coats. If she can assume that σ = 6.2 minutes, what can she assume with probability 0.99 about the maximum error of her estimate?
N = 150, σ = 6.2, α = 0.01 for Probability = 0.005 à z = 2.575 into formula for maximum error:
E = 2.575*(6.2/sqrt(150)) = 1.30
1) Work back from 1 – α = probability (in this case 0.99) à 1 – probability = α à α =0.01
2) Divide a by 2 to determine area under curve à 0.01/2 = 0.005
3) In R: use qnorm() to work backwards to find z à qnorm(0.005) = -2.575
* Remember that R calculations for area under curve work from left (-∞) so if we wanted to find the (+)maximum value from the mean add 0.005 to 0.99 à qnorm(0.995) = 2.575
4) Plug into E = (Zα /2) * σ/(n)½ à E = (2.575)*(6.2/(sqrt(150)) = 1.303537

Supervisor can say with probability 0.99 that her error will be at most 1.30 minutes from the µ.
è In statistical jargon: Supervisor can be 99% confident that the error of her estimate will be made after the data is collected.

In general probability statements about future events of random variables and confidence statements once the data has been obtained.

Example 9.2 Quality control specialist wants to use mean of a random sample of size n = 35 to determine the average fat content of large shipment of hamburgers. Value of σ is not known, but specialist maintains that σ could NOT be larger than 0.25 ounces. What can he assert with 0.95 probability about the maximum size of error?
N = 35            σ < 0.25          Z0.025 = 1.96 ß α = 0.025*2 = 0.05 ß 1 – α = 0.95
E = (Zα /2) * σ/(n)½ à E = (1.96)*(0.25/(sqrt(35))) = 0.083
Can assert with probability of at least 0.95 that the error is at most 0.083ounce.

*** Both previous examples rely on knowing the standard deviation of the population (σ). In reality often there is no σ provided. à Must use sample standard deviation s (reasonable provided that sample is sufficiently large, n ≥ 30).

To determine sample size needed to estimate the mean of population + assert with the probability that the error is at most E:
Sample size estimating µ (n ≥ 30)  à       
Example 9.4 Superintendent of an irrigation service which uses river water to irrigate local farms wants to use mean of random sample to estimate the number of acre feet of water pumped each day and wants the estimate to be in error by at most 0.5 acres foot per day. From previous studies of a similar kind, knows that σ = 0.75 acre foot. With 95% confidence about the maximum error, how large a sample does the superintendent have to take?
0.95 à 1 – α = 0.95 à α = 0.025*2 = 0.05 à Z0.025 = 1.96                     σ = 0.75          E = 0.50
n = ((Za/2 * σ)/0.50)2 = 8.64 ≈ 9

9.2 Confidence Intervals for Means

Different method of presenting information provided by sample mean with assessment of error

Large random samples from infinite populations à ≈ normal curve à z =  
Since 1-α is the probability that random variable will take value between -Zα/2 and Zα/2 we can say that -Zα/2 < z =  < Zα/2 fiddle around with the equation to make:

Large-sample Confidence Interval for µ à x-bar – Z α/2 * σ/(n)1/2 < µ < x-bar + Z α/2 * σ/(n)1/2
·      Can assert with probability of 1 – α, 100% confidence that the interval above contains the population mean we are trying to estimate (assuming based on large random sample of course)
·      Confidence limits: End points of a confidence interval.
·      Degree of confidence: Probability of 1 – α. à Common values are 0.95 (z = 1.96) and 0.99 (z = 2.575) à 95% and 99% confidence intervals for µ.

Example 9.5 Previous pulse rate data, n = 32, = 26.2, s = 5.15 for σ, and Z0.025 = 1.96 à Confidence interval is
26.2 – 1.96 * 5.15/sqrt(32) < µ < 26.2 + 1.96 * 5.15/sqrt(32) = 24.41562 < µ < 27.98438 à
24.4 < µ < 28.0
95% of the time the interval may contain µ.
* Note that the surer we want to be (increasing degree of certainty), the less we have to be sure of.

Interval estimate: When confidence interval is applied to estimate the mean of a population.

9.3 Confidence Intervals for Means (Small Samples)

Previous examples assumption that when sample is large enough n ≥ 30 we can treat the sampling distribution of the mean as if a normal distribution and, when necessary, replace σ with s.

For small samples, it is necessary to assume the population being sampled has roughly similar shape as normal distribution.
t statistic: t = (x̄ - µ)/(s/sqrt(n)) à Sampling distribution is a continuous, t distribution à similar to standard normal distribution w/ symmetry and bell shaped with µ = 0.

Exact shape of t distribution depends on parameter called degrees of freedom (aka # of degrees of freedom, df) given by sample size less 1(n – 1).

When we use t distribution for calculating maximum error:
E = tdf,α/2 * (s/√n)
Where E = potential (maximum) error, df = degrees of freedom, s = sd of sample, n = sample #

t distribution is similar to Zα /2 à 1- α, except tα/2 depends on n – 1.

Since t distribution is symmetrical about t = 0, we can find that
 plug in t = (x̄ - µ)/(s/sqrt(n)) à

Fiddle around with the numbers to get Small-Sample Confidence Interval for µ:

Degree of confidence is 1 – α, notice how the formula is similar as the large-sample formula (with s substituted for σ) except for the t distribution, tα/2 replace Zα/2­.

Example 9.6 To test the durability of new paint, highway department paint at 8 different locations. Electronic counters showed that they deteriorated after being crossed 14.26, 16.78, 13.65, 10.83, 12.64, 13.37, 16.20, 14.94 million cars. What is the 95% confidence interval for the average amount of traffic this pain can withstand before deteriorating?
N = 8   x̄ = 14.08        s = 1.92          Degrees of freedom = 8 – 1 = 7      t0.025 à z = 2.365
Plug into formula: 14.08 – 2.365 * (1.92/sqrt(8)) < µ < 14.08 + 2.365 * (1.92/sqrt(8))
à 12.47 < µ < 15.69
Above is 95% confidence interval estimate of the average amount of traffic that the pain can withstand before deteriorating.

9.4 Confidence Interval for Standard Deviations

Working to estimate population standard deviations with s for σ for large-sample confidence interval.
For large samples, n ≥ 30, sampling distribution ≈ normal with mean σ and the standard deviation σs = σ/sqrt(2n)  {aka standard error of s)
à sampling distribution of z =  ≈ standard norm distribution

Plug into the z expression of inequality: -Zα/2 < z < Zα/2 à
with degree of confidence 1 – α.

9.5 The Estimation of Proportions

Information that is usually available for estimation of a TRUE proportion is
Sample proportion: x/n, where x = # of times than event has occurred in n =number of trials

TRUE proportion estimated with sample proportion dilation OR magnification.

Assuming that situations satisfy the binomial distribution (independent trials w/ each trial a probability of success, p) à µ = np, σ = sqrt(np(1-p)) à z = (x – np)/sqrt(np(1-p)) ≈ standard normal distribution

Plug into the inequality to get:
Standard error of a proportion =  = standard deviation of sampling distribution of sample proportion.

Fiddle around more with the inequality equation to get the large-sample confidence interval formula for p:                     as usual confidence is (1-α)100%.

For Approximate Maximum Error of Estimate in using x/n to Estimate p:
can assert with (1-α)100% confidence that error is at most E.

For Sample Size Estimation of p (with some information about p):
For Sample Size Estimation of p (w/o information about p):

Example 9.11 We want to estimate what proportion of adult pop. Of US has HTN and want to be 99% sure the the estimate will not excess 0.02. How large is sample that will be needed if
a)    We have NO idea of TRUE proportion
Substitute into formula à ≈ 4145
b)   We know that TRUE proportion lies on interval from 0.05 to 0.20
≈ 2653

No comments:

Post a Comment