THE FOLLOWING NOTES USE EXTENSIVE MS Word Math Equations that will NOT transfer over to HTML.
Chapter 8 Sampling and Sampling Distributions
Main objective of most statistical studies is to make
generalizations based on samples,
about the parameters of populations. Sample size and quality varies depending
on what we are trying to find.
Most methods in this book will use random sampling – method of sampling for which every possible
sample has the same probability of being selected.
Because: It permits valid or logical, generalizations and hence is widely used in practice.
Because: It permits valid or logical, generalizations and hence is widely used in practice.
8.1 Random Sampling
Two types of populations:
·
Finite populations – Consists of a finite/fixed
number of elements/observations.
·
Infinite populations – NO limit to the number of
items that can be observed.
Random sampling
ONLY applies to finite populations.
A sample size n from
a finite population of size N is random if
it is chosen in such a way that each of the NCn or possible samples has the same probability,
1/NCn or of being selected.
Random
sample if samples chosen to have equal probability of occurring. ⏎
Simplest way of taking random samples from finite population – refer to table of random numbers à Nowadays there are computer
programs to generate random numbers.
For random samples
from Infinite population – Random
ONLY if it consists of values of independent random variables (variables do not
affect each other) have the same distribution.
Example: Random sampling of 5 individuals from infinite
population of weight loss of persons on two-week diet with µ = 7.4 lbs and σ =
1.3 lbs should have values of similar µ and σ.
8.2 Sampling Distribution
Sampling distribution
– Variation of sample mean, median, standard deviation due to values taken from
population.
Example p.270 Finite
population of N = 5, numbers 1, 3, 5, 7, and 9. Population has a mean µ = 5 and
a standard deviation of σ = (8)½. A sample n = 2 is taken from the
population.
5C2 = 10 possibilities à
(1,3) (1,5) (1,7) (1,9) (3,5) (3,7) (3,9) (5,7) (5,9) (7,9)
Possible means of the sample and their probabilities are 2(1/10),
3(1/10), 4(2/10), 5(2/10), 6(2/10), 7(1/10), and 8(1/10).
Further study into meta-analysis à Study the probability distribution of the sample mean.
σx̄2 = (2-5)2*0.1 + (3-5)2*0.1
+ (4-5)2*0.2 + (5-5)2*0.2 + (6-5)2*0.2 + (7-5)2*0.1
+ (8-5)2*0.1 = 3
σx̄ = (3)½
Note from this example: Mean of the sampling distribution =
µ, mean of population (µ = µx̄). AND standard deviation of sampling
distribution smaller than standard deviation of population (σ > σx̄).
For larger population numbers – computer simulation.
R Example: population
of 1000#’s ranging from 0 to 9, sample n = 10
First generating a random list of numbers in R and storing
the values in a variable named, “A”
Creating the population of unknown w/o parameters à A=sample(0:9,1000,replace=TRUE)
à mean(A) = 4.313 var(A)= 8.081112 sd(A)= 2.84273
*** If parameters present adjust accordingly.
Taking a sample from this group of 1000#’s, with replacement
and storing the values in a variable (storing in a variable saves that one
probability else the numbers will be different each time) à SAMPLE1=sample(A, 10,
replace=TRUE) à
generates one possible sample of n=10 for a population of 1000 numbers from 0
to 9.
When without
replacement:
1.
NumberCombo=combn(A,10) # The # of possible
sample combinations, assigned variable
2.
Means=colMeans(NumberCombo) # Mean of 10 number
combinations, assigned variable
3.
Table(Means) # Displays table of previous data
4.
Barplot(table(Means)) # Make barplot of previous
data
*** This is how it theoretically works, however, when you
actually try this the # of possible combinations for more than 1000 #s is too
much for R to handle.
à
ALTERNATIVE is to use stimulated numbers:
1.
Replicate(10000,mean(sample(A,10))) # Gets mean
of sample #’s, but limits calculations
2.
Table(B) # Table of possible means
3.
Hist(B) # Create histogram of #s
OR another path:
To cheat and help get a list of numerous possible means à create a hypothetical
large new sample made from the original population àSplit new hypothetical large
sample into segments of 10 (sample size, n = 10) àObtain
the mean of each of these 10 number segments for sampling distribution mean àgo meta by search for
the mean of these n=10 means.
1)
Draws=sample(A,size=4*500,replace=TRUE)
2)
Draws=matrix(draws,10) #Note that the draws variable
is reused for simplification.
3)
Drawsmean = apply(draws,2,mean) #apply the mean
function to each of 10 # blocks
8.3 Standard Error of the Mean
Can be used to determine how close a sample mean might be to
the mean of the population. For the
random samples of size n taken from a population having the mean µ and the standard deviation σ, the
sampling distribution of x̄ has …
1.
Mean of sampling distribution of x̄ = mean of
population (µx̄ = µ).
2.
Standard
error of mean: *Sampling distribution’s standard deviation (σx̄
= σ/(n)½ OR σx̄ = ).
*Depending on whether the population is infinite or finite
in size N. Latter has the finite
population correction factor of Usually omitted
unless sample constitutes at least 5% of population (n ≥ 5% of population).
Standard error of
mean = standard deviation of sampling distribution of the mean. Used in
statistics to measure how much the sample mean can be expected to fluctuate due
to chance.
Based on the two formulas for standard error of mean shown
above,
variability of
population ↑ (increases) à
standard error of mean ↑ (increases)
sample size ↑
(increases) à standard error of mean ↓ (decreases)
Example 8.3 When
sampling from an infinite population
what happens to standard of error of the
mean (hence size of error exposed to when sample mean (x̄) used as estimate
of population mean (µ)) when sample size
is increased from n =50 to n = 200?
σx̄ = σ/(n)½ Ratio of the two standard
errors: new standard error of mean = σ/√200 old
standard error of mean = σ/√50 à
(σ/√200)/( σ/√50) = √50/√200 = √(50/200) = √(1/4) = ½ à new standard area of mean as
a result of increase in sample size from 50 to 200 is divided by 2.
Example 8.4 Find
the value of the finite population correction factor for n = 100 and (a) N =
10,000 (b) N = 200.
(a) ≈ 0.995 à
very close to one so omitted for practical purposes.
(b) ≈ 0.709à
Significant difference possible large
Example 8.6 For n = 15, N = 1000, σ = 289 what value might be
expected for the standard deviation of the 100 sample means?
Using standard error of mean for
finite population: à
= 74.09478
The original population of
integers from 1:1000 has sampling standard error of mean of about 68.8 as
calculated through R:
A=1:1000
B=replicate(100,mean(sample(A,15)))
sd(B)
Which gives ~ 68.77874. The values
of 68.8 and 74.1 are somewhat close.
8.4 Central Limit Theorem
When sample mean is used to
estimate sample population possible error à
Confidence in estimate expressed via probability about size of error.
Chebyshev’s theorem applied – Sample mean à Estimated population mean
then, probability of at least 1-1/k2
that our error will be less than k *
σx̄.
Example 8.7 Based on Chebyshev’s theorem with k=2, what can we
assert about the maximum size of our error if we use the mean of a random
sample of size n = 64 to estimate the mean of an infinite population with σ
= 20?
Standard error of mean = à
= 20/8 = 2.5
We can assert with a probability
of at least 1 – 1/22 =
0.75 that error will be less than 2*2.5 = 5.
Central Limit Theorem – IF x̄
is the mean of a random sample of size n from an infinite population, with
the mean of µ and the standard deviation σ and n is large, then
has approximately the standard
normal distribution.
Central limit theorem allows use
of normal curve method to wide range of problems:
·
Finite populations – when n is large, WHILE n/N is
small.
·
Infinite populations when n is both large, but constitutes a small portion of the population
(usually n =30). However, IF the population being sampled has roughly the shape
of the normal curve à
sampling distribution of mean ≈ normal distribution regardless of size of n.
Example 8.9 Suppose that σ = 5.5 tons of daily sulfur oxide
emissions of a certain industrial plant. What is the probability that the mean
of a random sample of size n = 40 will differ from the mean of the population
by less than 1.0 ton?
z = (x̄ - µ)/(σ/n½) à (x̄ - µ) = ± 1, σ
= 5.5, n = 40 à z
= -1/(5.5/√40) ≈ -1.15 and z = 1/(5.5/√40) ≈ 1.15.
Using R: The area between the two
z values of ± 1.15 = pnorm (1.15) – pnorm(-1.15) = 0.7498561
Example 8.10 The time it takes students
in a cooking school to learn how to prepare a particular meal is a random
variable with mean µ = 3.2 and the standard deviation σ = 1.8
hours. Find the probability that the average time it will take 36 students to
learn how to prepare the meal is less than 3.4 hours.
Sampling distribution of mean has
n > 30, allowing for approximation with normal curve.
z = (x̄ - µ)/(σ/n½) à µ = 3.2
hours , σ = 1.8 hours, n = 36, x̄ < 3.4 à z = (3.4 – 3.2)/(1.8/36½)
= +⅔ à We want
less than 3.4 hours (which should be to the right of the population mean of
3.2) à Using R:
pnorm(⅔) = 0.7475075
Other methods of sampling distribution:
- Standard
error of median for infinite populations:
σx̄ = (1.25)(σ/n½), where n = size of sample, σ = population standard deviation
Better used for medium sized samples from non-normal populations.
σx̄ = (1.25)(σ/n½), where n = size of sample, σ = population standard deviation
Better used for medium sized samples from non-normal populations.
Formula for conversion to standard
units:
The mean subtracted in the numerator may be µ or µx̄ and the
standard deviation divided may be σ or σx̄ depending on whether we
are dealing with one observation or with a mean.
No comments:
Post a Comment