Search This Blog

Wednesday, May 1, 2019

MAT 209 - Statistics Chapter 8 - Sampling and Sampling Distribution


THE FOLLOWING NOTES USE EXTENSIVE MS Word Math Equations that will NOT transfer over to HTML.

Chapter 8 Sampling and Sampling Distributions

Main objective of most statistical studies is to make generalizations based on samples, about the parameters of populations. Sample size and quality varies depending on what we are trying to find.

Most methods in this book will use random sampling – method of sampling for which every possible sample has the same probability of being selected.
Because: It permits valid or logical, generalizations and hence is widely used in practice.

8.1 Random Sampling

Two types of populations:
·      Finite populations – Consists of a finite/fixed number of elements/observations.
·      Infinite populations – NO limit to the number of items that can be observed.
Random sampling ONLY applies to finite populations.

A sample size n from a finite population of size N is random if it is chosen in such a way that each of the NCn or  possible samples has the same probability, 1/NCn or  of being selected.
Random sample if samples chosen to have equal probability of occurring. ⏎

Simplest way of taking random samples from finite population – refer to table of random numbers à Nowadays there are computer programs to generate random numbers.

For random samples from Infinite population – Random ONLY if it consists of values of independent random variables (variables do not affect each other) have the same distribution.
Example: Random sampling of 5 individuals from infinite population of weight loss of persons on two-week diet with µ = 7.4 lbs and σ = 1.3 lbs should have values of similar µ and σ.

8.2 Sampling Distribution

Sampling distribution – Variation of sample mean, median, standard deviation due to values taken from population.

Example p.270 Finite population of N = 5, numbers 1, 3, 5, 7, and 9. Population has a mean µ = 5 and a standard deviation of σ = (8)½. A sample n = 2 is taken from the population.
5C2 = 10 possibilities à (1,3) (1,5) (1,7) (1,9) (3,5) (3,7) (3,9) (5,7) (5,9) (7,9)
Possible means of the sample and their probabilities are 2(1/10), 3(1/10), 4(2/10), 5(2/10), 6(2/10), 7(1/10), and 8(1/10).
Further study into meta-analysis à Study the probability distribution of the sample mean.
µ = (2*0.1) + (3*0.1) + (4*0.2) + (5*0.2) + (6*0.2) + (7*0.1) + (8*0.1) = 5 
σ2 = (2-5)2*0.1 + (3-5)2*0.1 + (4-5)2*0.2 + (5-5)2*0.2 + (6-5)2*0.2 + (7-5)2*0.1 + (8-5)2*0.1 = 3
σ = (3)½
Note from this example: Mean of the sampling distribution = µ, mean of population (µ = µ). AND standard deviation of sampling distribution smaller than standard deviation of population (σ > σ).

For larger population numbers – computer simulation.
R Example: population of 1000#’s ranging from 0 to 9, sample n = 10
First generating a random list of numbers in R and storing the values in a variable named, “A”
Creating the population of unknown w/o parameters à A=sample(0:9,1000,replace=TRUE) à mean(A) = 4.313         var(A)= 8.081112            sd(A)= 2.84273
*** If parameters present adjust accordingly.

Taking a sample from this group of 1000#’s, with replacement and storing the values in a variable (storing in a variable saves that one probability else the numbers will be different each time) à SAMPLE1=sample(A, 10, replace=TRUE) à generates one possible sample of n=10 for a population of 1000 numbers from 0 to 9.

When without replacement:
1.     NumberCombo=combn(A,10) # The # of possible sample combinations, assigned variable
2.     Means=colMeans(NumberCombo) # Mean of 10 number combinations, assigned variable
3.     Table(Means) # Displays table of previous data
4.     Barplot(table(Means)) # Make barplot of previous data
*** This is how it theoretically works, however, when you actually try this the # of possible combinations for more than 1000 #s is too much for R to handle.
à ALTERNATIVE is to use stimulated numbers:
1.     Replicate(10000,mean(sample(A,10))) # Gets mean of sample #’s, but limits calculations
2.     Table(B) # Table of possible means
3.     Hist(B) # Create histogram of #s

OR another path:
To cheat and help get a list of numerous possible means à create a hypothetical large new sample made from the original population àSplit new hypothetical large sample into segments of 10 (sample size, n = 10) àObtain the mean of each of these 10 number segments for sampling distribution mean àgo meta by search for the mean of these n=10 means.
1)   Draws=sample(A,size=4*500,replace=TRUE)
2)   Draws=matrix(draws,10) #Note that the draws variable is reused for simplification.
3)   Drawsmean = apply(draws,2,mean) #apply the mean function to each of 10 # blocks

8.3 Standard Error of the Mean

Can be used to determine how close a sample mean might be to the mean of the population. For the random samples of size n taken from a population having the mean µ and the standard deviation σ, the sampling distribution of x̄ has …
1.     Mean of sampling distribution of x̄ = mean of population (µ = µ).
2.     Standard error of mean: *Sampling distribution’s standard deviation (σ = σ/(n)½ OR σ =  ).
*Depending on whether the population is infinite or finite in size N. Latter has the finite population correction factor of  Usually omitted unless sample constitutes at least 5% of population (n ≥ 5% of population).

Standard error of mean = standard deviation of sampling distribution of the mean. Used in statistics to measure how much the sample mean can be expected to fluctuate due to chance.

Based on the two formulas for standard error of mean shown above,
variability of population ↑ (increases) à standard error of mean ↑ (increases)
sample size ↑ (increases) à standard error of mean ↓ (decreases)

Example 8.3 When sampling from an infinite population what happens to standard of error of the mean (hence size of error exposed to when sample mean (x̄) used as estimate of population mean (µ)) when sample size is increased from n =50 to n = 200?
σ = σ/(n)½ Ratio of the two standard errors: new standard error of mean = σ/√200          old standard error of mean = σ/√50 à (σ/√200)/( σ/√50) = √50/√200 = √(50/200) = √(1/4) = ½ à new standard area of mean as a result of increase in sample size from 50 to 200 is divided by 2.

Example 8.4 Find the value of the finite population correction factor for n = 100 and (a) N = 10,000 (b) N = 200.
(a)  ≈ 0.995 à very close to one so omitted for practical purposes.
(b)  ≈ 0.709à Significant difference possible large

Example 8.6 For n = 15, N = 1000, σ = 289 what value might be expected for the standard deviation of the 100 sample means?
Using standard error of mean for finite population:  à  = 74.09478
The original population of integers from 1:1000 has sampling standard error of mean of about 68.8 as calculated through R:
A=1:1000
B=replicate(100,mean(sample(A,15)))
sd(B)
Which gives ~ 68.77874. The values of 68.8 and 74.1 are somewhat close.

8.4 Central Limit Theorem

When sample mean is used to estimate sample population possible error à Confidence in estimate expressed via probability about size of error.

Chebyshev’s theorem applied – Sample mean à Estimated population mean then, probability of at least 1-1/k2 that our error will be less than k * σx̄.

Example 8.7 Based on Chebyshev’s theorem with k=2, what can we assert about the maximum size of our error if we use the mean of a random sample of size n = 64 to estimate the mean of an infinite population with σ = 20?
Standard error of mean =  à  = 20/8 = 2.5
We can assert with a probability of at least 1 – 1/22 = 0.75 that error will be less than 2*2.5 = 5.

Central Limit Theorem – IF is the mean of a random sample of size n from an infinite population, with the mean of µ and the standard deviation σ and n is large, then

has approximately the standard normal distribution.

Central limit theorem allows use of normal curve method to wide range of problems:
·      Finite populations – when n is large, WHILE n/N is small.
·      Infinite populations when n is both large, but constitutes a small portion of the population (usually n =30). However, IF the population being sampled has roughly the shape of the normal curve à sampling distribution of mean ≈ normal distribution regardless of size of n.

Example 8.9 Suppose that σ = 5.5 tons of daily sulfur oxide emissions of a certain industrial plant. What is the probability that the mean of a random sample of size n = 40 will differ from the mean of the population by less than 1.0 ton?
z = (x̄ - µ)/(σ/n½) à (x̄ - µ) = ± 1, σ = 5.5, n = 40 à z = -1/(5.5/√40) ≈ -1.15 and z = 1/(5.5/√40) ≈ 1.15.
Using R: The area between the two z values of ± 1.15 = pnorm (1.15) – pnorm(-1.15) = 0.7498561

 Example 8.10 The time it takes students in a cooking school to learn how to prepare a particular meal is a random variable with mean µ = 3.2 and the standard deviation σ = 1.8 hours. Find the probability that the average time it will take 36 students to learn how to prepare the meal is less than 3.4 hours.
Sampling distribution of mean has n > 30, allowing for approximation with normal curve.
z = (x̄ - µ)/(σ/n½) à µ = 3.2 hours , σ = 1.8 hours, n = 36, x̄ < 3.4 à z = (3.4 – 3.2)/(1.8/36½) = +⅔ à We want less than 3.4 hours (which should be to the right of the population mean of 3.2) à Using R: pnorm(⅔) = 0.7475075

Other methods of sampling distribution:
-       Standard error of median for infinite populations:
σ = (1.25)(σ/n½), where n = size of sample, σ = population standard deviation
Better used for medium sized samples from non-normal populations.

Formula for conversion to standard units:
The mean subtracted in the numerator may be µ or µ and the standard deviation divided may be σ or σ depending on whether we are dealing with one observation or with a mean.

No comments:

Post a Comment