Module 12: Populations, Samples and Sampling Distributions This module provides basic information about the statistical concepts of populations and samples, selecting samples from population and the critical issue of sampling distributions. Reviewed 05 May 05 / MODULE 12 12 - 1 Populations and Samples Population: The entire group about which information is desired. Sample: A proportion or part of the population usually the proportion from which information is gathered. 12 - 2 Target Population The participants to whom the answer to the question pertains. The target population definition has two aspects: Conceptual Operational 12 - 3 Population Definition A population definition gives a clear statement of those included. The following are some examples: Adults and children 10-59 years of age residing in four census tracts in Richfield, a suburb of Minneapolis Adults 25-59 years of age residing in Cedar County, Iowa and certain rural townships in neighboring counties on July 1, 1973 Employees of Pacific Northwest Bell Telephone Company working in King County 12 - 4 Population Person

Population of Cholesterol values (mg/dl) 1 201 2 182 3 199 . . . . . . 128 124 129 180 12 - 5 Population of Cholesterol values 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 201 182 199 136 152 195 162 206 138 190 152 120 169 136 141 194 173 158 181 247 192 192 123 149 158 26

27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 172 164 136 161 160 165 169 159 168 185 189 174 114 161 153 165 142 173 138

174 186 175 164 220 150 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 162 151 197 206 160 131 193 235 192 221 194

153 168 162 162 158 143 184 133 180 165 149 155 129 217 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 127 147 164

161 178 177 176 146 179 185 155 150 167 154 159 187 164 151 155 159 261 169 137 154 189 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121

122 123 124 125 206 159 141 166 154 111 228 95 188 134 198 140 188 154 191 169 156 141 172 206 145 138 170 151 154 126 127 128 129 172 276 124 180 12 - 6 Sampling

In its broadest sense, sampling is a procedure by which one or more members of a population are picked from the population. The objective is to make certain observations upon the members of the sample and then, on the basis of these results, to draw conclusions about the characteristics of the entire population. 12 - 7 Selecting a Sample Haphazard Sample: Haphazard samples are constructed by arbitrarily selecting individual sample members. Random Sample: There are several methods for constructing random sampleswe consider only simple random samples. This process operates so that each member of the population has an equal chance of being selected into the sample. 12 - 8 Selecting a Sample The selection process: Assign to each member of the population the equivalent of sequential ID number; Use a random number table or computer generated numbers; For computer generated numbers, generate one for each ID number, sort the ID numbers in order according to the random number and take the first on the list up to the point when you have the sample size you need For a table, haphazardly select a starting point and then Ignore numbers that are too large Ignore a number after it appears the first time 12 - 9 Fundamental and Important Concept We now begin the discussion of perhaps the most important concept in biostatistics. It is fundamental to understanding and thus interpreting correctly the use of the many statistical tools we will cover in this course. The concept is not complex, in fact, it is rather simple. It does require, however, thinking about issues in a manner that may initially appear

somewhat different and unusual. 12 - 10 Looking at the Process When we randomly select a sample from a population, we can use the mean for the sample as an estimate or guess as to the value for the mean of the population. This should bring up the question as to how good is this sample mean or sample statistic as a guess for the value of the population mean or population parameter. The essence of this question has to do with how well this process worksthe process of using a sample to make guesses about the population. 12 - 11 Understanding the Process Two important aspects of this fundamental process: FIRST: It is critical to recognize that it is a process SECOND: It is important to understand how and how well the process works 12 - 12 How Good is a Sample Mean The essential question is How good is a sample mean as an estimate of the population mean? One way to examine this question is to understand that we used a process that involved randomly selecting a sample from the population and then calculating the mean for the values of the observations in the sample. We can repeat this process as many times as we wish and examine what it produces. 12 - 13 Sampling Distributions Individual Observations

149 146 132 . . . n = 1, = 150lbs 2 = 100lbs, = 10lbs 12 - 14 Sample with n = 5 198 217 46 189 149 172 162 42 121 198 201 309 111 220 100 201 261 156 133 Population of weights 149 156 201 121 n = 5; x = 732 105

Sample of 5 weights x = 732 = 146.4 5 12 - 15 Ten Different Samples, n = 5 Sample n Mean s2 s 1 5 147.43 88.14 9.39 2 5 153.98 117.91 10.86 3 5

146.50 103.66 10.18 4 5 155.53 91.99 9.59 5 5 147.87 149.65 12.23 6 5 143.60 66.76 8.17 7 5 146.87 64.23

8.01 8 5 149.19 280.88 16.76 9 5 150.05 200.28 14.15 10 5 146.92 173.36 13.17 148.79 133.69 11.25 Average 12 - 16 Sampling Distributions

Individual Observations 149 146 : n= 1 = 150 Ibs 2 = 100 Ibs2 = 10 Ibs Means for n=5 153.0 146.4 : n= 5 = 150 Ibs 2 20Ibs 2 n 2 x x 4.47Ibs n 12 - 17 Standard Error of the Mean The population that includes all possible samples of size n is a long list of numbers and the variance for these numbers can, in theory, be calculated. The square root of this variance is called the standard error of the mean. It is simply the standard deviation for this population of means. 2 n

2 x x n 12 - 18 Sample with n = 20 145 136 181 113 148 151 102 161 154 198 n = 20; 206 127 101 114 191 189 111 120 x = 3057

105 171 Sample of 20 weights x= 3057 = 152.85 20 133 12 - 19 Ten Different Samples, n = 20 Sample n Mean s2 s 1 20 150.86 100.96 10.05 2 20 146.88 122.70

11.08 3 20 147.65 119.51 10.93 4 20 149.37 51.07 7.15 5 20 153.30 109.54 10.47 6 20 152.83 111.96 10.58 7

20 148.62 91.94 9.59 8 20 152.16 140.83 11.87 9 20 154.40 179.56 13.40 10 20 151.43 115.85 10.76 150.75 114.39 10.59

Average 12 - 20 Sampling Distributions Individual observations Means for n=5 Means for n = 20 149 153.0 151.6 146 146.4 151.3 . . . . . . . . . = 150 lbs = 150 lbs

= 150 lbs 2 = 100lbs 2 x 2 = 10 lbs x n 20 lbs 2 4.47 lbs n 2 x 2 x n 5 lbs 2 2.23 lbs n 12 - 21