Chapter 12 Inference About One Population 1 12.1 Introduction In this chapter we utilize the approach developed before to describe a population. Identify the parameter to be estimated or tested. Specify the parameters estimator and its sampling distribution. Construct a confidence interval estimator or perform a hypothesis test. 2 12.1 Introduction We shall develop techniques to estimate and test three population parameters. Population mean Population variance 2

Population proportion p 3 12.2 Inference About a Population Mean When the Population Standard Deviation Is Unknown Recall that when is known we use the following statistic to estimate and test a population mean z x n When is unknown, we use its point estimator s, and the z-statistic is replaced then by the t-statistic 4 The t - Statistic ZZt t t

Z ttt Z x x Z t t t t t Z ss n s s n s s s s ss When the sampled population is normally distributed, the t statistic is Student t distributed.

5 The t - Statistic Using the t-table t The t distribution is mound-shaped, and symmetrical around zero. d.f. = v2 v1 < v2 d.f. = v1 0 x s n The degrees of freedom, (a function of the sample size)

determine how spread the distribution is (compared to the normal distribution) 6 Testing when is unknown Example 12.1 - Productivity of newly hired Trainees 7 Testing when is unknown Example 12.1 In order to determine the number of workers required to meet demand, the productivity of newly hired trainees is studied. It is believed that trainees can process and distribute more than 450 packages per hour within one week of hiring. Can we conclude that this belief is correct, based on productivity observation of 50 trainees 8 (see file Xm12-01).

Testing when is unknown Example 12.1 Solution The problem objective is to describe the population of the number of packages processed in one hour. The data are interval. H0: = 450 H1: > 450 WWeewwa r e reaacchh 9 annt ttotoppr The t statistic roovveet th 900%%ppr haat t t rood t x s

n duucctitvivit h ityyoof feex theetrtarainineeeess xppeerireien ncceeddww oorkrkeersrs d.f. = n - 1 = 49 9 Testing when is unknown Solution continued (solving by hand) The rejection region is t > t,n 1 t,n - 1 = t.05,49 t.05,50 = 1.676. From the data we have x i 23,019

2 x i 10,671,357, thus 23,019 x 460.38, and 50 s2 x 2 i x i

2 n 1507.55. n 1 s 1507.55 38.83 10 Testing when is unknown Rejection region The test statistic is t 1.676 x s n

460.38 450 38.83 50 1.89 1.89 Since 1.89 > 1.676 we reject the null hypothesis in favor of the alternative. There is sufficient evidence to infer that the mean productivity of trainees one week after being hired is greater than 450 packages at .05 significance level. 11 Testing when is unknown t-Test: Mean Pack ages

Mean 460.38 Standard Deviation 38.83 Hypothesized Mean 450 df 49 t Stat 1.89 P(T<=t) one-tail 0.0323 t Critical one-tail 1.6766 P(T<=t) two-tail 0.0646 t Critical two-tail 2.0096 .05 .0323 Since .0323 < .05, we reject the null hypothesis in favor of the

alternative. There is sufficient evidence to infer that the mean productivity of trainees one week after being hired is greater than 450 packages at .05 significance level. 12 Estimating when is unknown Confidence interval estimator of when is unknown tt 22 xx ss nn nn 11 dd..ff.. 13 Estimating when is unknown Example 12.2

An investor is trying to estimate the return on investment in companies that won quality awards last year. A random sample of 83 such companies is selected, and the return on investment is calculated had he invested in them. Construct a 95% confidence interval for the mean return. 14 Estimating when is unknown Solution (solving by hand) The problem objective is to describe the population of annual returns from buying shares of quality award-winners. The data are interval. x 15.02 s 2 68.98 s 68.98 8.31 Solving by hand From the Xm12-02 we determine x t s

2, n 1 n 15.02 1.990 t.025,82 t.025,80 8.31 83 13.19,16.85 15 Estimating when is unknown t-Estimate: Mean Mean Standard Deviation LCL UCL Returns

15.02 8.31 13.20 16.83 16 Checking the required conditions We need to check that the population is normally distributed, or at least not extremely nonnormal. There are statistical methods to test for normality (one to be introduced later in the book). From the sample histograms we see 17 A Histogram for Xm12- 01 14 12 10 8 6

4 2 0 400 425 450 475 500 525 550 Packages A Histogram for Xm12- 02 30 25 20

575 More 15 10 5 0 -4 2 8 14 Returns 22 30

More 18 12.3 Inference About a Population Variance Sometimes we are interested in making inference about the variability of processes. Examples: The consistency of a production process for quality control purposes. Investors use variance as a measure of risk. To draw inference about variability, the parameter of interest is 2. 19 12.3 Inference About a Population Variance The sample variance s2 is an unbiased, consistent and efficient point estimator for 2. (n 1)s 2 The statistic has a distribution called Chi2

squared, if the population is normally distributed. d.f. = 5 22 ( n 1 ) s (n 1)s 22 22 nn 11 dd..ff.. d.f. = 10 20

Testing and Estimating a Population Variance From the following probability statement P(21-/2 < 2 < 2/2) = 1- we have (by substituting 2 = [(n - 1)s2]/2.) 2 2 1)s 2 2 1)s ((nn 1)s 22 ((nn 1)s 22 2 2 / /22 11/ /22 21

Testing the Population Variance Example 12.3 (operation management application) A container-filling machine is believed to fill 1 liter containers so consistently, that the variance of the filling will be less than 1 cc (.001 liter). To test this belief a random sample of 25 1-liter fills was taken, and the results recorded (Xm12-03) Do these data support the belief that the variance is less than 1cc at 5% significance level? 22 Testing the Population Variance Solution The problem objective is to describe the population of 1-liter fills from a filling machine. The data are interval, and we are interested in the variability of the fills. The complete test is: sisistteenntt n s o n

c o c is s is s s e pprrooccees h t e r h t e H0 : 2 = 1 r h t e e h

t h e knnooww wwh H1: <1 2 ok Wee wwaanntt tto W (n 1)s The test statistic is 2 . 2 2 The rejection region is 1 ,n 1 2 2 23

Testing the Population Variance Solving by hand Note that (n - 1)s2 = (xi - x)2 = xi2 (xi)2/n From the sample (Xm12-03) we can calculate xi = 24,996.4, and xi2 = 24,992,821.3 Then (n - 1)s2 = 24,992,821.3-(24,996.4)2/25 =20.78 2 ( n 1 ) s 20.78 2 2 2 20.78, 1 12 ,n 1 .295,25 1 13.8484. Thereisisinsufficient insufficientevidence evidence

There rejectthe thehypothesis hypothesisthat that totoreject thevariance varianceisisless lessthan than1.1. the Since 13.8484 20.78, do not reject the null hypothesis. 24 Testing the Population Variance = .05 1- = .95 Rejection

region 2 13.8484 13.8484 20.8 2 .295,25 1 Do not reject the null hypothesis 25 Estimating the Population Variance Example 12.4 Estimate the variance of fills in Example 12.3 with 99% confidence. Solution We have (n-1)s2 = 20.78. From the Chi-squared table we have 2/2,n-1 = 2.005, 24 = 45.5585 2/2,n-1 2.995, 24 = 9.88623 26

Estimating the Population Variance The confidence interval estimate is (n 1)s (n 1)s 2 2 2 / 2 1 / 2 2 2 20.78 20.78 2 45.5585 9.88623 .46 2.10 2

27 12.4 Inference About a Population Proportion When the population consists of nominal data, the only inference we can make is about the proportion of occurrence of a certain value. The parameter p was used before to calculate these probabilities under the binomial distribution. 28 12.4 Inference About a Population Proportion Statistic and sampling distribution the statistic used when making inference about p is: x x where where pp nn

the number number ofof successes successes. . xx the sample size size. . nn sample Under certain conditions, [np > 5 and n(1-p) > 5], p is approximately normally distributed, with = p and 2 = p(1 - p)/n. 29 Testing and Estimating the Proportion Test statistic for p p pp p ZZ pp((11 pp))//nn where np np55 and

and nn((11 pp))55 where Interval estimator for p (1- confidence level) zz/ 2/ 2 pp((11 pp))//nn pp provided nnpp55 and and nn((11 pp))55 provided 30 Additional example Testing the Proportion Example 12.5 (Predicting the winner in election day) Voters are asked by a certain network to participate in an exit poll in order to predict the winner on election day. Based on the data presented in Xm12-05 where 1=Democrat, and 2=Republican), can the network conclude that the republican candidate will win the state college vote? 31

Testing the Proportion Solution The problem objective is to describe the population of votes in the state. The data are nominal. The parameter to be tested is p. Success is defined as Vote republican. The hypotheses are: H0: p = .5 H1: p > .5 More than than 50% 50% vote vote Republican Republican More 32 Testing the Proportion Solving by hand The rejection region is z > z = z.05 = 1.645.

From file we count 407 success. Number of voters participating is 765. The sample proportion is p 407 765 .532 The value of the test statistic is Z p p p(1 p) / n .532 .5 .5(1 .5) / 765 1.77 The p-value is = P(Z>1.77) = .0382 33 Testing the Proportion z-Test : Proportion Sample Proportion Observations

Hypothesized Proportion z Stat P(Z<=z) one-tail z Critical one-tail P(Z<=z) two-tail z Critical two-tail 0.532 765 0.5 1.77 0.0382 1.6449 0.0764 1.96 There is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. At 5% significance level we can conclude that more than 50% voted Republican. 34 Estimating the Proportion Nielsen Ratings

In a survey of 2000 TV viewers at 11.40 p.m. on a certain night, 226 indicated they watched The Tonight Show. Estimate the number of TVs tuned to the Tonight Show in a typical night, if there are 100 million potential television sets. Use a 95% confidence level. Solution p z / 2 p (1 p ) / n .113 1.96 .113(1 .113) / 2000 .113 .014 35 Estimating the Proportion Solution z - Estimate: Proportion Viewers Sample Proportion Observations LCL UCL 0.113 2000

0.099 0.127 A confidence interval estimate of the number of viewers who watched the Tonight Show: LCL = .099(100 million)= 9.9 million UCL = .127(100 million)=12.7 million 36 Selecting the Sample Size to Estimate the Proportion Recall: The confidence interval for the proportion is p z / 2 p (1 p ) / n Thus, to estimate the proportion to within W, we can write W z / 2 p (1 p ) / n 37 Selecting the Sample Size to Estimate the Proportion The required sample size is

zz/ /22 nn pp((11 pp)) W W 22 38 Sample Size to Estimate the Proportion Example Suppose we want to estimate the proportion of customers who prefer our companys brand to within .03 with 95% confidence. 2

1.96 p(1 p) Find the sample size. n Solution .03 W = .03; 1 - = .95, therefore /2 = .025, so z.025 = 1.96 Since the sample has not yet been taken, the sample proportion is still unknown. We proceed using either one of the following two methods: 39 Sample Size to Estimate the Proportion Method 1: There is no knowledge about the value of p

Let p .5 . This results in the largest possible n needed for a 1- confidence interval of the form p .03 . If the sample proportion does not equal .5, the actual W will be narrower than .03 with the n obtained by the formula below. Method 2: There is some idea about the value of p Use the value of to calculate the sample size p 1.96 .5(1 .5) n .03 2 1,068 1.96 .2(1 .2) n

.03 2 683 40