THE FOLLOWING NOTES USE EXTENSIVE MS Word Math Equations that will NOT transfer over to HTML.
Chapter 9 Problems of Estimation
Statistical inference – make generalizations based on
samples (two types):
·
Problems
of estimation: Assign a numerical value to a population parameter hoping
that the sample estimate will be close OR assign an interval of values based on
sample hoping that it will contain the population parameter in question.
·
Tests of
hypotheses: Accept/reject assumptions about parameters/shape/form of
population hoping that what we do will not be in error.
9.1 Estimation of Means
Often working from sample mean x̄ to determine population
average µ.
Point estimate:
Use of sample data to calculate a single specific value of an unknown
population parameter à
Problem: Lacks information describing its basis + no mention of error
Fluctuations of the sample mean from the true mean of the
population depends on two factors as discussed in Chapter 8:
1.
Size of sample
2.
Size of population standard deviation σ
Scientific reports often present sample means (x̄), n = # and s
= sample standard deviation.
Example 7.5 If Zα
denotes the value of z for which the area under the standard normal curve to
its right
is = to α, similar to log terminology.
What is Z0.01 and Z0.05?
Z0.01 à 1-pnorm(z) = 0.01 à 1-0.01 = pnorm(z) à 0.99 = pnorm(z) à qnorm(0.99) = z =
2.326348 ≈ 2.33
Z0.05 à 1-pnorm(z) = 0.05 à 1-0.05 = pnorm(z) à 0.95 = pnorm(z) à qnorm(0.95) = 1.644854
≈ 1.645
Alternative: Zβ = value of z, area under curve to
left is β.
Z0.99 à
pnorm(z) =0.99 à
qnorm(0.99) = z = 2.326348
Z0.95 à
pnorm(z) = 0.95 à
qnorm(0.95) = z = 1.644854
Example 7.17
Based on above, Zα/2 denotes value for z for which area under the
standard normal curve to its right is equal to α/2 à Area between -Zα/2 and Zα/2 is equal to 1 – α.
(a) Z0.025
= 1.96 with α = 0.05 à
area between z = ±1.959964 is 1 – 0.05 = 0.95
pnorm(1.959964) – pnorm(-1.959964) = 0.9496102 ≈ 0.95
pnorm(1.959964) – pnorm(-1.959964) = 0.9496102 ≈ 0.95
Making use of µx̄ = µ and σx̄ = σ/(n)½
and properties n > 30 with the population infinite OR finite & large à
When we use x̄ as an an estimate of µ, the probability is 1 – α that this estimate will be “off” by either way by at most
E
= (Zα /2) * σ/(n)½
Where E (or ME) =
the maximum possible error
(potential error) from the mean.
Two values commonly used for 1 – α are z0.025 = 1.96 à
α = 0.025*2 = 0.05 à
1 – α = 0.95 and z0.005 = 2.575 à
α = 0.005*2 = 0.01 à
1 – α = 0.99
Example 9.1
Supervisor of garment manufacturing firm intends to use mean time required to
sew a sample of 150 women’s coats to estimate the average time required to sew
all such women’s coats. If she can assume that σ = 6.2 minutes, what can she
assume with probability 0.99 about the maximum error of her estimate?
N = 150, σ = 6.2, α = 0.01 for Probability = 0.005 à z = 2.575 into formula
for maximum error:
E = 2.575*(6.2/sqrt(150)) = 1.30
1) Work back from 1 – α = probability (in this case 0.99) à 1 – probability = α à α =0.01
2) Divide a by 2 to determine area under curve à 0.01/2 = 0.005
3) In R: use qnorm() to work backwards to find z à qnorm(0.005) = -2.575
* Remember that R calculations for area under curve work
from left (-∞) so if we wanted to find the (+)maximum value from the mean
add 0.005 to 0.99 à
qnorm(0.995) = 2.575
4) Plug into E = (Zα /2) * σ/(n)½
à E
= (2.575)*(6.2/(sqrt(150)) = 1.303537
Supervisor can say with probability 0.99 that her error will
be at most 1.30 minutes from the µ.
è
In statistical jargon: Supervisor can be 99%
confident that the error of her estimate will be made after the data is
collected.
In general probability
statements about future events of random variables and confidence statements once the data
has been obtained.
Example 9.2
Quality control specialist wants to use mean of a random sample of size n = 35 to
determine the average fat content of large shipment of hamburgers. Value of σ is
not known, but specialist maintains that σ could NOT be larger than 0.25
ounces. What can he assert with 0.95 probability about the maximum size of error?
N = 35 σ <
0.25 Z0.025 = 1.96 ß α = 0.025*2 = 0.05 ß 1 – α = 0.95
E = (Zα /2) * σ/(n)½ à E =
(1.96)*(0.25/(sqrt(35))) = 0.083
Can assert with probability of at least 0.95 that the error
is at most 0.083ounce.
*** Both previous examples rely on knowing the standard
deviation of the population (σ). In reality often there is no σ provided. à Must use sample
standard deviation s (reasonable
provided that sample is sufficiently large, n ≥ 30).
To determine sample size needed to estimate the mean of
population + assert with the probability that the error is at most E:
Sample size estimating µ (n ≥ 30) à
Example 9.4
Superintendent of an irrigation service which uses river water to irrigate
local farms wants to use mean of random sample to estimate the number of acre
feet of water pumped each day and wants the estimate to be in error by at most 0.5 acres foot per day. From previous
studies of a similar kind, knows that σ = 0.75 acre foot. With 95% confidence
about the maximum error, how large a sample does the superintendent have to
take?
0.95 à
1 – α = 0.95 à α
= 0.025*2 = 0.05 à Z0.025
= 1.96 σ = 0.75 E = 0.50
n = ((Za/2 * σ)/0.50)2 = 8.64 ≈ 9
9.2 Confidence Intervals for Means
Different method of presenting information provided by
sample mean with assessment of error
Large random samples from infinite populations à ≈ normal
curve à z =
Since 1-α is the probability that random variable will take
value between -Zα/2 and Zα/2 we can say that -Zα/2
< z
= < Zα/2 fiddle around with the
equation to make:
Large-sample Confidence Interval for µ à x-bar – Z α/2
* σ/(n)1/2 < µ < x-bar + Z α/2 * σ/(n)1/2
·
Can assert with probability of 1 – α, 100%
confidence that the interval above contains the population mean we are trying
to estimate (assuming based on large random sample of course)
·
Confidence
limits: End points of a confidence interval.
·
Degree of
confidence: Probability of 1 – α. à
Common values are 0.95 (z = 1.96) and 0.99 (z = 2.575) à 95% and 99% confidence
intervals for µ.
Example 9.5 Previous pulse rate data, n = 32, x̄ = 26.2, s = 5.15 for σ, and Z0.025
= 1.96 à
Confidence interval is
26.2 – 1.96 * 5.15/sqrt(32) < µ < 26.2 + 1.96 *
5.15/sqrt(32) = 24.41562 < µ < 27.98438 à
24.4 < µ < 28.0
95% of the time the interval may contain µ.
* Note that the surer we want to be (increasing degree of
certainty), the less we have to be sure of.
Interval estimate:
When confidence interval is applied to estimate the mean of a population.
9.3 Confidence Intervals for Means (Small Samples)
Previous examples assumption that when sample is large
enough n ≥ 30 we can treat the sampling distribution of the mean as if a normal
distribution and, when necessary, replace σ with s.
For small samples,
it is necessary to assume the population being sampled has roughly similar
shape as normal distribution.
t statistic: t = (x̄ -
µ)/(s/sqrt(n)) à Sampling distribution
is a continuous, t distribution à
similar to standard normal distribution w/ symmetry and bell shaped with µ = 0.
Exact shape of t
distribution depends on parameter called degrees of freedom (aka # of degrees of freedom, df) given by
sample size less 1(n – 1).
When we use t distribution for calculating maximum error:
E
= tdf,α/2 * (s/√n)
Where E = potential (maximum) error, df = degrees of
freedom, s = sd of sample, n = sample #
t distribution is
similar to Zα /2 à
1- α, except tα/2 depends on n – 1.
Since t distribution is symmetrical about t = 0, we can find
that
plug in t = (x̄ - µ)/(s/sqrt(n)) à
Fiddle around with the numbers to get Small-Sample Confidence Interval for µ:
Degree of confidence is 1 – α, notice how the formula is similar
as the large-sample formula (with s
substituted for σ) except for the t distribution, tα/2 replace Zα/2.
Example 9.6 To
test the durability of new paint, highway department paint at 8 different
locations. Electronic counters showed that they deteriorated after being
crossed 14.26, 16.78, 13.65, 10.83, 12.64, 13.37, 16.20, 14.94 million cars.
What is the 95% confidence interval for the average amount of traffic this pain
can withstand before deteriorating?
N = 8 x̄ = 14.08 s = 1.92 Degrees
of freedom = 8 – 1 = 7 t0.025
à z = 2.365
Plug into formula: 14.08 – 2.365 * (1.92/sqrt(8)) < µ
< 14.08 + 2.365 * (1.92/sqrt(8))
à
12.47 < µ < 15.69
Above is 95% confidence interval estimate of the average
amount of traffic that the pain can withstand before deteriorating.
9.4 Confidence Interval for Standard Deviations
Working to estimate population standard deviations with s for σ for large-sample confidence
interval.
For large samples, n ≥ 30, sampling distribution ≈ normal
with mean σ and the standard
deviation σs = σ/sqrt(2n)
{aka standard error of s)
à
sampling distribution of z = ≈ standard norm distribution
Plug into the z expression of inequality: -Zα/2
< z < Zα/2 à
with degree of confidence 1 – α.
9.5 The Estimation of Proportions
Information that is usually available for estimation of a
TRUE proportion is
Sample proportion:
x/n, where x = # of times than event
has occurred in n =number of trials
TRUE proportion
estimated with sample proportion dilation
OR magnification.
Assuming that situations satisfy the binomial distribution
(independent trials w/ each trial a probability of success, p) à µ = np, σ = sqrt(np(1-p))
à z = (x –
np)/sqrt(np(1-p)) ≈ standard normal distribution
Plug into the inequality to get:
Standard error of a
proportion = = standard
deviation of sampling distribution of sample proportion.
Fiddle around more with the inequality equation to get the
large-sample confidence interval formula for p: as usual confidence is (1-α)100%.
For Approximate Maximum
Error of Estimate in using x/n to Estimate p:
can assert with (1-α)100% confidence that error is at most
E.
For Sample Size
Estimation of p (with some information about p):
For Sample Size
Estimation of p (w/o information
about p):
Example 9.11 We
want to estimate what proportion of adult pop. Of US has HTN and want to be 99%
sure the the estimate will not excess 0.02. How large is sample that will be
needed if
a)
We have NO idea of TRUE proportion
Substitute into formula à ≈
4145
b)
We know that TRUE proportion lies on interval
from 0.05 to 0.20
≈ 2653
No comments:
Post a Comment