An Overview of Sampling Techniques Regional Course on Sampling for Agricultural Censuses and Surveys ELEMENTS OF SAMPLE DESIGN Bangkok 14th to 18th May 2012 Statistical Institute of Asia and the Pacific Economic and Social Commission of Western Asia United Nations Contents Important probability sampling schemes: SRS, systematic, stratified, probability proportional to size (PPS), stage sampling Key concepts: Sample, characteristics, population parameter, estimator & estimate, Sampling distribution and sampling variance; clustering &
precision Determination of sample size: use of Deff. Activity May 2012 Exercise & Discussion: Relative efficiency of estimators under different schemes. SIAP 2 Introduction Survey Design Issues involved 1. Determining survey objectives and data requirements
2. The population of interest or the target population 3. Reference period; Geographic and demographic boundaries s u c o f Main 4. Sampling frame and sampling unit 5. Sample design and sample size 6. Selection of the sample (at different stages) 7. Survey management and field procedures he t f o t n 8. Data collection Conte course
t n e s 9. Summary and analysis of the data e r p 10. Dissemination May 2012 SIAP 3 Introduction Sample A subset of the population on which observations are taken for the purpose of obtaining information about the population.
By studying a sample we hope to draw valid conclusions about the population. Thus, a sample should desirably be representative of the target population. May 2012 SIAP 4 Introduction Sample Design - two broad kinds Sample design specifies how to select the part of the population to be surveyed. Two broad kinds: Probability sampling: each element of the population is assigned a non-zero chance of being included in the sample. [focus of this course] Non-probability sampling: consists of a variety of
procedures, including judgment-based and purposive choice of elements considered representative of the population. May 2012 SIAP 5 Introduction Target population and Sampling frame Target population: The population intended to be studied or covered in the survey; also known as coverage universe. Sampling frame: A list of units (possibly with other information on each unit) from which selection of sample is made. May 2012
SIAP 6 Probability Sampling Probability Sampling Design We will discuss only probability sampling. The most common sampling techniques used for official surveys like simple random, systematic, stratified,
probability proportional to size (PPS) cluster sampling and multi-stage are all examples of probability sampling. May 2012 SIAP 7 Probability Sampling Basic Sampling Schemes Simple random sampling (SRS): is a probability selection scheme where each unit in the population is given an equal probability of selection. Systematic sampling: A method in which the sample is obtained by selecting every kth element of the population, where k is an integer > 1. Often the units are ordered with respect to that auxiliary data.
Stratified sampling: Uses auxiliary information (stratification variables) to divide the sampling units the population into groups called strata and increase the efficiency of a sample design. May 2012 SIAP 8 Probability Sampling Basic Sampling Schemes (Contd.) Probability Proportional to Size (PPS): The procedure of sampling in which the units are selected with probability proportional to a given measure of size. The size measure is the value of an auxiliary variable X related to the characteristic Y under study. May 2012
SIAP 9 Simple Random Sampling May 2012 SIAP 10 SRS What is a SRS Simple random sampling SRS is simplest method of probability sampling SRS is special type of equal probability selection method (epsem). Rarely used in practice for large scale surveys Theoretical basis for other sample designs
May 2012 SIAP 11 SRS Kinds of SRS SRS selection can be made With Replacement (SRSWR) or Without Replacement (SRSWOR) Selection probability: the probability that a population unit is selected at any given draw is the same, namely 1 N for both SRSWR and SRSWOR. N: number of units in the population (Population size) May 2012
SIAP 12 SRS Selection Procedure Steps involved: Get a list (sampling frame) which uniquely identifies each unit in the population Allocate a serial number to each unit of the frame Generate random numbers [in the range of 1 to N] using Random Number Table/ Random Number Generator on computer: For SRSWR: select the units with the serial numbers same as the first n random numbers generated, even if there be repetitions.
For SRSWOR: select the units with the serial numbers same as the first n distinct random numbers generated May 2012 SIAP 13 Systematic Sampling - Linear systematic sampling - Circular systematic sampling May 2012 SIAP 14 Systematic Sampling Systematic Sampling
Systematic Sampling (SYS), like SRS, involves selecting n sample units from a population of N units Instead of randomly choosing the n units in the sample, a skip pattern is run through a list (frame) of the N units to select the sample The skip or sampling interval, k = N/n May 2012 SIAP 15 Systematic Sampling Linear Systematic Sampling Sampling interval, k k 1
2 3 n-2 n-1 k ..... 1 r N Random start May 2012
SIAP 16 Systematic Sampling Selection Procedure - Linear Systematic Sampling Steps involved: Form a sequential list of population units Decide on a sample size n and compute the skip (sampling interval), k = N/n Choose a random number, r (random start) between 1 and k (inclusive) Add k to selected random number to select the second unit and continue to add k repeatedly to previously selected unit number to select the remainder of the sample May 2012 SIAP
17 Systematic Sampling Problem - Linear Systematic Sampling If N is a multiple of n, then the number of units in each of the k possible systematic samples is n. In this case systematic sampling amounts to grouping the N units into k samples of exactly n units each in a systematic manner and selecting one of them with probability 1/k. In this case, the sampling scheme is epsem. But, if N/n is not an integer, then the number of units selected systematically with the sampling interval k [ = nearest integer to N/n] no longer epsem. This problem may be overcome by adopting a device, known as circular systematic sampling. May 2012 SIAP
18 Systematic Sampling Circular Systematic Sampling K=5/2=2.5 a) If k=2 possible samples are: ac; bd; ce; da and eb b) If k=3 possible samples are: ad; be; ca; db and ec. May 2012 SIAP 19 Systematic Sampling Circular Systematic selection
Useful when N/n is not integer Determine the interval k rounding down to the integer nearest to N/n [If N = 15 and n = 4, then k is taken as 3 and not 4] Take a random start between 1 and N Skip through the circle by k units each time to select the next unit until n units are selected Thus there could be N possible distinct samples instead of k This method is termed Circular Systematic Sampling (CSS) May 2012 SIAP 20 Systematic Sampling Systematic Sampling Important Features Often used as an alternative to SRS.
Requires ordering of the population units Ordering enables SYS sample to be more representative Ordering done by geographical location (say of dwellings) ensures fair spread of sample Ordering done by industry type ensures fair representation of industries Ensures each population unit equal chance of being selected into sample May 2012 SIAP 21 Systematic Sampling Advantages and Disadvantages Advantages: Operationally convenient - easier to draw a sample. SYS distributes the sample more evenly over the population
thus likely to be more efficient than SRSWOR, particularly when the ordering of the units in the list is related to characteristics of the variable of interest. Disadvantages : Requires complete list of the population. A bad arrangement of the units may produce a very inefficient sample Variance estimates cannot be obtained from a single systematic sample. May 2012 SIAP 22 Stratified Sampling May 2012 SIAP
23 Stratified Sampling Stratification Divide the population into a number of distinct groups (strata) based on auxiliary information - referred to as stratification variables - relating to study variable(s) The division of the population into strata is termed stratification Each stratum is composed of units that satisfy the condition set by the values of the stratifying variable. Main purpose: to improve the sample estimations, i.e. to reduce the standard error of the estimates.
May 2012 SIAP 24 Stratified Sampling Stratification Mutually Exclusive subsets X X X X X X X Stratum no. 1
2 h L Stratum size N1 N2 Nh NH May 2012 SIAP 25
Stratified Sampling Stratified sampling Stratified sampling involves: division or stratification of the population into homogeneous (similar) groups called strata; and selecting the sample using a selection procedure like SRS or systematic sampling or PPS within each stratum and independent of the other strata. May 2012 SIAP 26 Stratified Sampling
Stratified sampling (Contd.) Sampling in each stratum is carried out independently Sampling fractions may differ Selection procedures may also be different The total sample size is distributed over all the strata allocation. At the end of the survey, the stratum results are combined to provide an estimate for entire population. May 2012 SIAP 27 Stratified Sampling Stratified Sampling in practice In most surveys - household or establishment surveys stratification is used Stratification can be used with any type of sampling design
Stratified sampling can be used in Single - stage designs Multi - stage designs May 2012 SIAP 28 Stratified Sampling Implicit Stratification This refers to a systematic sampling with the units arranged in a certain order. Prior to sample selection, all the units are sorted with respect to one or more variables that are deemed to have a high correlation with the variable of interest. Implicit stratification guarantees that the sample of units will be spread across the categories of the stratification variables.
May 2012 SIAP 29 Probability Proportional to Size Sampling or PPS Sampling May 2012 SIAP 30 PPS Sampling Sampling with Probability Proportional to Size (PPS) Probability of selection is related to an auxiliary variable, Z, that is a measure of size
Example Number of households Area of farms Larger units are given higher chance of selection than smaller units Zi p = th N i Selection probability of i unit is Zi i =1 i = 1,2, , N 31 PPS Sampling PPS Selection Procedures
Cumulative total method: with replacement Cumulative total method: without replacement PPS systematic sampling Lahiris method 32 PPS Selection
Cumulative Total Method Select a sample of 5 villages using varying probability WR sampling, the size being the number of households Solution Sampling unit: village Measure of size: number of households in village Selection probability: pi 33 PPS Selection Cumulative Total Method (Contd.) Write down cumulative total for the sizes Zi, i=1,2..N Choose a random number r such that 1 r Z Select ith population unit if
Ti-1 r Ti where Ti-1 = Z1 + Z2 + .. + Zi-1 and Ti = Z1 + Z2 + .. + Zi PPS Selection Cumulative Total Method (Contd.) To select a village, a random number r, 1 r 700, is selected. Suppose r = 259, Since 231 259 288, the 7th village is therefore selected. The next 4 random numbers to be considered are 548, 170, 231, 505. Hence the required sample selected using PPS with replacement are 16th, 5th, 7th, 15th . Note: The 7th village is selected twice. 35
PPS Selection Cumulative Total Method (Contd.) For a PPSWR selection therefore the sample would be: 16th, 5th, 7th, 15th , with 7th village repeated. For a PPSWOR selection, we have to continue further to get 5 distinct units in the sample. Suppose the next random selected is r = 375, The required PPSWOR sample would be 16th, 5th, 7th, 15th & 11th . 36 PPS Selection PPS Systematic Derive cumulative totals for the sizes Zi, i=1,2..N, and allot random
numbers to different units. Calculate interval k = ZN /n (in this case 700/5 = 140) Select a random number r (say 101) from 1 to k; and obtain r+k, r+2k, r+3k, , r+(n-1)k In this case, the selected cumulative sizes are 101, 241, 382, 523 & 664. 37 PPS Selection PPS Systematic (Contd.) Thus the selected units are: 3rd (for 101), 7th (for 241), 11th (for 382), 15th (for 523) & 20th (for 664) Note: If any unit has size greater than k, it may be selected more than
once. 38 PPS Selection Lahiris Method A procedure which avoids the need of calculating cumulative totals for each unit has been given by Lahiri (1951) Steps involved; 1. Select a random number i from 1 to N 2. Select another random number j, such that 1 j M, where M is either equal to the maximum of sizes Zi, i =1,2,.. N, or is more than the maximum size in the population. 3. If j Zi , the ith unit is selected, otherwise, the pair (i, j) of random numbers is rejected and another pair is chosen by repeating the steps (1) and (2) 39 PPS Sampling
Lahiris Method Select a sample of 2 villages using varying probability WR sampling, the size being the number of households Solution N =20, n=2 , M =58 Select a random number i, 1 i 14, Then a second random number j, 1 j 58, Suppose the 1st pair of random number is (2, 30). Since 30 45 thus 2nd village is selected . 40 PPS Sampling Lahiris Method Solution (continued) Similarly we find the next pair of random number (12, 47)
since 47 >30, the 12th village is not selected The 3rd pair ot random numbers (7, 40) results in the selection of 7th village since 40 58 Hence, the selected sample are 2nd and 7th villages. 41 Clustering and Stratification May 2012 SIAP 42 Clustering and Stratification Strata and Clusters Both stratification and clustering involve subdividing the
population into mutually exclusive groups. Sub-divisions of the population are called clusters or strata depending upon the sampling procedure adopted. The term cluster is used in the context of cluster sampling and multi-stage (cluster) sampling. To understand the application of these in different situations, let us take a simple example. May 2012 SIAP 43 Clustering and Stratification Choice of Strata and Clusters - an Example Using data of a census taken 5 years ago, a population of about 120 units is sub-divided into six groups of approximately equal size. [Example]
May 2012 SIAP 44 Clustering and Stratification Naturally occurring clusters Clusters are usually defined as groups of units that are found naturally clustered together - by location or socially defined entities like households or by institutions like schools and enterprises. Cluster Population Unit Census Enumeration Area Dwelling household Person Day
Hour School Student Employer Employee May 2012 SIAP 45 Clustering and Stratification Clustering and Stratification in Sample Design Typically, sample surveys conducted by NSOs involve subdividing the population into strata and clusters. Usually, the technique of stratifying the clusters and then further stratifying the units within clusters are applied to obtain the final sample. The samplers objective is to get the right combination of
stratification and clustering to get the required estimates at the desired level of accuracy with the given resources. May 2012 SIAP 46 Clustering and Stratification Clustering and Stratification in Sample Design (Contd.) The reliability or precision of the estimates depends on the degree to which the sample is clustered. Generally, clustering increases the sampling variance considerably. Usually, stratification is applied to decrease the sampling variance, but its effect is often not significant. Effects of clustering and stratification is measured by the design effect, or deff. Primarily, deff indicates, how much clustering there is in the
survey sample. May 2012 SIAP 47 Cluster Sampling May 2012 SIAP 48 Cluster Sampling Cluster sampling Cluster sampling - selection of a sample of clusters and survey all the units of each selected clusters. This is also called Single-stage cluster sampling.
Multi-stage cluster sampling or simply multi-stage sampling: Instead of doing survey of all the units of selected clusters, only a sample of units are taken from each selected clusters. May 2012 SIAP 49 Cluster Sampling Selecting a (single-stage) cluster sample Required sampling frame: list of all the clusters. From the list, a sample of clusters is selected - this using a selection scheme (e.g., SRS, Systematic) All population units within the selected clusters are listed The information is then collected from all the units of the selected clusters May 2012
SIAP 50 Cluster Sampling Cluster sampling - Advantages Main advantage Exact knowledge of the size of the sub-divisions (clusters) not required, unlike that for stratified sampling. Often a complete list of clusters - defined by location or as social entities or by institutions is available, but frame of population units is not available or is costly to obtain. In such cases, cluster sampling can be adopted. Reduced cost if personal interviews, particularly when the survey cost increases with the distance separating the sampled units. May 2012 SIAP
51 Cluster Sampling Cluster sampling - Disadvantages Main disadvantage Increased sampling error due to a less representative sample, since: in practice, units are typically homogeneous within normally defined clusters and the composition of clusters can not be altered, as they are pre-defined. May 2012 SIAP 52 Sub-sampling Multi-stage sampling
May 2012 SIAP 53 Multi-stage (Cluster) Sampling Multi-stage Cluster Sampling In a single-stage cluster sampling, a sample of cluster is selected and all the population units of each selected clusters are surveyed. When clusters to large to cover all their population units in the survey, a sample of population units from each selected cluster is surveyed. Such a design is called Multi-stage cluster or simply multi-stage sampling. May 2012
SIAP 54 Multi-stage (Cluster) Sampling Multi-stage Sampling As the name suggests, Multi-stage sampling involves multiple stages of sampling. The number of stages can be numerous, although it is rare to have more than 3 stages. For the present course, we will concentrate only on two-stage sampling. The process of selecting a sample of population units from selected clusters is known as Sub-sampling.
May 2012 SIAP 55 Multi-stage (Cluster) Sampling Stage-wise Selection Stage sampling is an extension of cluster sampling. For a two-stage sampling, we select the clusters at the first stage selected clusters are called first stage units (FSUs) or primary stage units (PSUs); then select a sample of units from within each selected cluster selected units are called second stage units (SSUs). May 2012 SIAP
56 Multi-stage (Cluster) Sampling Sampling Units at different Stages Examples of two-stage sampling Stage 1 Stage 2 villages households Dwellings People Hospitals Patients Businesses Employees Coconut trees Coconuts May 2012 SIAP 57
Multi-stage (Cluster) Sampling Advantages of multistage sampling Sampling frames normally available at higher stages, may be prepared at lower stages Cost considerations Flexibility in choice of sampling units and methods of selection at different stages Contributions of different stages towards sampling variance may be estimated separately May 2012 SIAP 58 Multi-stage (Cluster) Sampling Sampling at two stages
In practice, many multi-stage designs involve complex subsampling of and within PSUs. The selection at the two stages are done independently and may employ different sampling schemes like: SRSWOR Systematic Probability Proportional to Size (PPS) May 2012 SIAP 59 Sampling Basic concepts and definitions: Population and sampling unit
Population parameter Statistic Estimator and estimate Unbiasedness and consistency May 2012 SIAP 60 Definitions
Sampling Unit A sampling unit is an entity that is selected in the sampling process. This may be different from unit of observation. When we select a sample of households for LFS, target units of observation are persons living in the household. select a sample of households (target unit of observation) by selecting a sample of villages first, then selecting a sample of households within the sampled villages. May 2012 SIAP 61 Definitions Characteristic (1) Different kinds of information on the elements of the
population (or populations) are collected in a survey. Each of these items of information is called a characteristic. Each of the characteristics have different possible values for different individual units. Observations on several characteristics of the units are collected in a survey. May 2012 SIAP 62 Definitions Characteristic (2) A characteristic can be a quantitative variable like
age of a person income of a household number of cattle on a farm area of land under rice crop in an agricultural holding value of output of a manufacturing unit or an attribute like sex of a person aged 15+ employment status of a person economic activity code of a production unit SIAP 2011 63 Definitions Population Parameter and Sample Statistic
A population parameter is a numerical summary of a population Any numerical measure computed from a subset of the population (typically a sample) is a statistic. Population Parameter Sample Mean (x) Statistic Sample May 2012 SIAP 64
Example, the values of X and Y shown in the table below are the actual values (not known to the sampler). # milch animals (X) Milk output (Y) A 3 145 48.3 B 6
260 43.3 C 5 245 49.0 D 5 290 72.5
E 2 140 70.0 F 4 180 45.0 Milk Producers May 2012
SIAP average yield (R) 65 Definitions Population Parameter A population parameter is a summary measure of a population, the value of which helps to describe a population. For example Total population Average weekly income - an indicator of well-being of a community. Literacy rate May 2012
SIAP 66 Definitions Estimator An estimator is a sample statistic. A sample statistic is a summary value of a variable calculated from the sample. An estimator is any quantity calculated from the sample data a function of sample observations which is used to give information about an unknown quantity of the population. Example: sample mean is an estimator of the population mean. May 2012
SIAP 67 Definitions Estimate An estimate is an indication of the value of an unknown quantity based on observed data. More formally, an estimate is the particular value of an estimator that is obtained from a particular sample of data and used to indicate the value of a parameter. May 2012 SIAP 68 In the example
sample Samples sample values of X sample values of Y ratio estimate 1st unit 2nd unit 1st unit 2nd unit mean ( x ) 1st unit 2nd unit mean of R C D 5 5 5 245 290 267.5 53.5 A B 3 6 4.5
145 260 202.5 45.0 Estimates Estimators May 2012 Sample mean SIAP Sample ratio 69 Definitions - Estimators Desirable Qualities of an
Estimator Unbiasedness Consistency Efficiency May 2012 SIAP 70 Estimators Desirable qualities
Unbiased Estimator An unbiased estimator of a population parameter is an estimator whose expected value is equal to that parameter. In the example, Sample means of average number of workers and average output are unbiased estimators of the respective population parameters. But, sample output-worker ratio is not an unbiased estimator of the corresponding population parameter. May 2012 SIAP 71 Definitions - Estimators Consistency An estimator is said to be consistent if the difference
between the estimator and the parameter grows smaller as the sample size grows larger. Sample ratio (in the example) is not unbiased but is a consistent estimator. May 2012 SIAP 72 Definitions - Estimators Efficiency Efficiency is defined as the reciprocal of sampling variance. If there are two unbiased estimators of a parameter, the one whose variance is smaller is said to be relatively efficient. May 2012 SIAP
73 Measures of Sampling Error Sampling error Sampling Distribution and Sampling Variance Efficiency and Design effect (Deff) May 2012 SIAP 74 Sampling Error Sampling Error The error in a sample estimate that owes to the selection of only a subset (sample) of the total population rather than the entire population.
Sampling error represents the difference between the estimate and the value of the population parameter. All sample estimates are subject to sampling error. The most commonly used measure of sampling error is sampling variance. May 2012 SIAP 75 Sampling Error Sampling Variance (1) Sampling variance is a measure of sampling error. Sampling error reflects the difference between an estimate derived from a sample and the "true value. It can be measured from the population values, if they are known. (But these are unknown - otherwise there would be no
need for a survey). May 2012 SIAP 76 Sampling Error Sampling Variance (2) Definition: The average of squares of (value of the survey estimator obtained from a sample minus the value of the population parameter) over all possible samples that can be drawn from the population. The variance of an estimator contains information regarding how close the estimator is to the population parameter. Estimates from a survey may have different sampling variance. May 2012
SIAP 77 Sampling Error Sampling Distribution A frequency distribution of the values of an estimator for each sample that can possibly be drawn from the population. May 2012 SIAP 78 Sampling Error Sampling Variance determining
factors Sampling error (variance) is affected by a number of factors: variability within the population. sample size - sampling fraction sample design, and If sampling principles are applied carefully within the constraints of available resources, sampling error can be accurately measured and kept to a minimum. May 2012 SIAP 79 Sampling Error Sampling Variance and Population Variability The sample design and sample size remaining unchanged,
the higher the population variance (measure of variability of a population) the higher is the sample variance. May 2012 SIAP 80 Sampling Error Sampling Variance and Sample size The sample design and population (variance) remaining unchanged, the higher the sample size the lower is the sample variance. May 2012 SIAP
81 Sampling Error Sampling Variance and Sample Design For a given population and sample size, the sample variance depends on the sample design adopted. The relative efficiency of a sample design w.r.t. the simplest sample design SRSWOR is measured by Design effect (Deff) May 2012 SIAP 82 Sampling Error
Design Effect (Deff) Design effect of a sample design, say D, is defined as the ratio the standard errors of D and SRSWOR. Deff = s.e.(design D ) s.e.( SRSWOR) For the sample designs used in practice (i.e. large scale sample surveys) the Deft is usually greater than 1. Estimates of Deff are often used for determining the required sample size for a given design. May 2012 SIAP 83 Estimates of Sampling Error May 2012
SIAP 84 Estimates of Sampling Error Sampling Variance measure of accuracy Sample survey data are analysed and interpreted for different purposes. A measure of precision for each of the survey estimates is essential for all kinds of analysis. Estimates of sampling variance of survey estimators is the most commonly used measure of their precision. May 2012 SIAP 85
Estimates of Sampling Error Variance Estimators The estimates of variances, MSEs and standard errors of survey estimates are computed directly from the sample data. The basic theory of sample survey gives the algebraic formula of variance estimator that can be used for estimating variances of survey estimators. These standard variance estimators can not be used for complex sample surveys or for non-linear estimators. May 2012 SIAP 86 Estimates of Sampling Error
Other measures of accuracy (1) Standard error: The squared root of the sampling variance of an estimator. s.e. (Y) = Var (Y) Relative standard error / coefficient of variation: standard error/value of parameter. RSE = s.e./Y May 2012 SIAP 87 Estimates of Sampling Error Other measures of accuracy (2) Mean square error (MSE): When the estimator of the population is not unbiased, average of squares of (value of the survey estimator obtained from a
sample minus the value of the population parameter) over all possible samples that can be drawn from the population. MSE = sampling variance + (Sampling Bias)2 May 2012 SIAP 88 Effects of Clustering on Precision May 2012 SIAP 89 Effects of Clustering Effects of Clustering
Most often, natural clusters are used as the primary stage units (PSUs) in efficient multi-stage sampling. Choice of natural clusters helps in reduction of costs. But, most often has an adverse effect on the sampling variance and thus the deff. The magnitude of deff of a sample design (for a particular estimator) is mainly determined by the degree of homogeneity of the clusters. Degree of homogeneity of clusters is measured by Intra-class correlation, roh (). May 2012 SIAP 90
Effects of Clustering Homogeneity of clusters - roh Intra-class (or Intra-Cluster) correlation, roh (): A (Y = i=1 ij jk A - Y ).(Yik - Y ) B ( B 1). (Yij - Y ) 2
i =1 j =1 Where A: no. of clusters Y B: no. of units in a cluster (when same) : overall population mean Yij : value of jth element of the ith cluster May 2012 SIAP 91
Effects of Clustering Value of roh Intra-class (also called intra-cluster) correlation roh or the measure of homogeneity indicates how similar are the units within the same cluster in comparison with the elements in the population as a whole. The value depends on the nature of variable and on the homogeneity and physical size of the clusters. It lies in the range: 1 1 B 1 May 2012
SIAP [Example] 92 Effects of Clustering Deff and roh The sampling variance of estimates obtained from cluster samples depends on the roh. When the clusters are of uniform size B and a number of clusters are selected (i.e. total sample size of a.B), its relative efficiency as compared to SRSWOR is given by Deff2 = [1+(B-1)] May 2012 SIAP 93
Effects of Clustering A Question We know, for (single-stage) cluster sampling, Deff2 = [1+(B-1)] When B = 1, a. what is the implication on sampling variance and deff? b. what is your interpretation? c. what is the inclusion probability (when sample size is n and population size is N) May 2012 SIAP 94 Effects of Clustering Value of roh and variability In practical situations, the value of depends on
the nature of variable and physical size of the clusters. In case of multi-stage (cluster) sampling, deft also depends on the size of the sample per primary cluster. Usually the values are small: 0.01 to 0.10. But it still has a big effect: If = 0.10 and B = 11, then deft2 = 2 That is, variance with a cluster sampling will twice of that of SRSWOR May 2012 SIAP 95 Effects of Clustering Value of roh and variability (Contd.) When the value is negative, the deft is less than 1 (i.e. cluster sampling is more efficient than SRSWOR).
For example, in a LFS, where all members of the selected households are surveyed, the deft for the estimates like sex-ratio, age distribution, proportion of the aged etc. are usually less than 1, since, the values in these cases are likely to be negative. May 2012 SIAP 96 Effects of Clustering Deff and roh in Stage Sampling Since it is an extension of cluster sampling, the stage sampling is also subject to effects of clustering. Recall that, in case of single-stage cluster sampling, Deff2 = [1+(B -1)]
In case of multi-stage sampling, its relative efficiency as compared to SRSWOR is given by Deff2 = [1+(b -1)] since the number of units surveyed at the latter stage is just b (not B as for single stage). May 2012 SIAP Determination of sample size - for estimation of mean and proportion under: SRSWOR and other designs May 2012 SIAP 98 Sample size determination
Determining Factors A number of factors are required to be considered to determine sample size of planned survey: Degree of accuracy required Population size (for small populations) Degree of variation in the population Sample selection plan Cost & Time May 2012 SIAP 99 Sample size determination Determination of sample size Theoretically correct way to determine the sample size for a survey is:
identify the key variables in the survey determine the accuracy requirements for estimates of population parameters for these variables select a sample large enough to meet these requirements May 2012 SIAP 100 Sample size determination Underlying theory We have seen that for -level of confidence, Z .r.s.e 2
where is the maximum error (as proportion of parameter to be estimated). Thus for a given level of permissible error, , the r.s.e. of the estimator should be restricted to r.s.e. Z 2 2 for 95% confidence level, for which Z/2 is 1.96 2. May 2012 SIAP 101
Sample size determination For mean under SRSWR For sample size n, 2 s . e .( y ) and Var ( y ) = n n Therefore,
r.s.e.( y ) c.v. n Thus, for a given permissible error limit of , the required sample size for 95% confidence level is 2 n0 4. c.v. May 2012 SIAP SIAP 2 102 Sample size determination For proportion under SRSWR The variance estimate of sample proportion p is
v(p) = p(1-p)/n Therefore the relative standard error (r.s.e) of p r.s.e( p) v( p) p (1 p) n. p Thus, if p = .10 and n = 10000, the r.s.e.(p) = 0.03 (i.e. 3%) May 2012 SIAP 103
Sample size determination For mean under SRSWOR For large a population, the value f is of a known order and do not change much with choice on n. For sample size n, 2 and s.e.( y ) = 1 f . Var ( y ) =(1 Therefore, f ). n r.s.e.( y ) = 1 f . c.v. n
Thus, for a given permissible error limit of , the required sample size for 95% confidence level is 2 c . v . n0 4.(1 f ) 2 The term (1-f) may as well be ignored. May 2012 SIAP 104 n
Sample size determination For proportion under SRSWOR Estimate of proportion under SRSWOR: The variance estimate of sample proportion p is v(p) = (1-f).p(1-p)/n Therefore the relative standard error (r.s.e) of p r.s.e( p ) = (1 - f ).v( p) ( 1 p ) = ( 1 f ).
p n. p Thus, if p = .10 and n = 10000, the r.s.e.(p) 0.03 (i.e. 3%) May 2012 SIAP 105 Sample size determination Sample size under other Sample Designs For other sampling designs, use of an estimate of deff makes it much easier. If n0 be the required sample size under SRSWOR, the required sample size under a design D is nD = n0 .deff2(D) / c where c is the expected response rate.
The estimates of deff and c are usually obtained from similar surveys conducted in the past. May 2012 SIAP 106 Sample size determination Main Steps Involved 1. Identify the main study variables. 2. For each of these, a. specify the permissible limit of error: b. obtain estimates of the required parameters: deft, c, 2 approximate sampling fraction f, and P or . 3. Work out the required sample size for each of the main study variables 4. If the maximum among the sample sizes worked out is permitted by the budget, accept;
5. otherwise, review the sample design and explore possibilities of relaxing the permissible limits and repeat the process. May 2012 SIAP 107 Thanks May 2012 SIAP 108 Sub-division of Population Using data of a census taken 5 years ago, a population of about 120 units is sub-divided into six groups. Each group has approximately an equal number of units. Case 1: each sub-division has only one type of units
Case 2: each sub-division has an uniform composition by type. A Case -1 C A C D E Poor May 2012 B Case -2
F non-poor lower class SIAP B D E F middle &upper class Suggest a strategy of selection Task: to estimate the proportion of poor in the population Resources permit only a sample size of 20. What will be your strategy of selecting a sample for each o f these cases? A
Case -1 C A C D E Poor May 2012 B Case -2 F
non-poor lower class SIAP B D E F middle &upper class Homogeneity and clustering Case 1: each subdivision is homogenous - thus stratified sampling preferable. [ = 1] Case 2: each subdivision a replica of the population studying one cluster is sufficient. [ {= -1/(B-1)}< 0] A Case -1
C A C D E Poor May 2012 B Case -2 F non-poor lower class
SIAP B D E F middle &upper class Composition commonly found In practice, the sub-divisions are neither of uniform composition nor totally homogenous. It is usually like the case 3 below: Case -3 Value of depends on the variable under consideration and nature of clusters.
B A Usually, (as in case 3) >0 C D F E Poor May 2012 non-poor lower class SIAP middle &upper class
RDA & DACS: Using a MARC-EAD Crosswalk to Improving Access to ...
RDA and DACS: Using a MARC-EAD Crosswalk to Improve Access to Special Collections Resources, a Project at UWG ... Exporting bib records out of Voyager and importing into Archivists' Toolkit. Training on Archivists' Toolkit. Writing a grant application to secure...