Finding Relationships among Variables

Finding Relationships among Variables

Chapte 3 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. BUSINESS ANALYTICS: DATA ANALYSIS AND DECISION MAKING Finding Relationships among Variables Introduction The primary interest in data analysis is usually in relationships between variables.

The most useful numerical summary measure is correlation. The most useful graph is a scatterplot. To break down a numerical variable by a categorical variable, it is useful to create side-by-side box plots. Excels pivot table breaks down one variable by others so that all sorts of relationships can be uncovered very quickly. The diagram in the file Data Analysis Taxonomy.xlsx gives you the big picture of which analyses are appropriate for which data types and which tools are best for performing the various analyses. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Relationships Among Categorical Variables

The most meaningful way to examine relationships between two categorical variables is with counts and corresponding charts of the counts. You can find counts of the categories of either variable separately, as well as counts of the joint categories of the two variables. Corresponding percentages of totals and charts help tell the story. It is customary to display all such counts in a table called a crosstabs (for crosstabulations). This is also sometimes called a contingency

table. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Example 3.1: Smoking Drinking.xlsx (slide 1 of 2) Objective: To use a crosstabs to explore the relationship between smoking and drinking. Solution: Data set lists the smoking and drinking

habits of 8761 adults. Categories have been coded N, O, H, S, and D for Non, Occasional, Heavy, Smoker, and Drinker. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Example 3.1: Smoking Drinking.xlsx (slide 2 of 2) To create the crosstabs,

enter the category headings in Excel and use the COUNTIFS function to fill the table with counts of joint categories. Next, sum across rows and down columns to get totals. Then express the counts as percentages of row and percentages of column. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Relationships Among Categorical Variables and a Numerical Variable The comparison problem is one of the most important problems in data analysis. It occurs

whenever you want to compare a numerical measure across two or more subpopulations. Examples: The subpopulations are males and females, and the numerical measure is salary. The subpopulations are different regions of the country, and the numerical measure is the cost of living. The subpopulations are different days of the week, and the numerical measure is the number of customers going to a particular fast-food chain. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Stacked and Unstacked Formats

There are two possible data formats, stacked and unstacked. The data are stacked if there are two long variables, such as Gender and Salary. The idea is that the male salaries are stacked in with the female salaries. This is the format you will see in the vast majority of situations. You will occasionally see data in unstacked format, when there are two short variables, such as Male Salary and Female Salary.

StatTools is capable of dealing with either format and can convert from stacked to unstacked or vice versa. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Stacked and Unstacked Data Stacked Data Unstacked Data 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Example 3.2: Baseball Salaries 2011 Extra.xlsx (slide 1 of 2)

Objective: To learn methods in StatTools for breaking down baseball salaries by various categorical variables. Solution: Data set contains the same 2011 baseball data examined previously, as well as several extra categorical variables. Create summary measures by selecting One-Variable Summary from the Summary Statistics dropdown list. Next, click the Format button and choose Stacked.

Then choose the Cat variable you want to categorize by and the Val variable you want to summarize. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Example 3.2: Baseball Salaries 2011 Extra.xlsx Create side-byside boxplots, by selecting BoxWhisker Plot from the Summary Graphs dropdown list and filling in the resulting dialog box. Select the Stacked

format so that you can choose a Cat variable and a Val variable. (slide 2 of 2) 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Relationships Among Numerical Variables To study relationships among numerical variables, a new type of chart, called a scatterplot, and two new summary

measures, correlation and covariance, are used. These measures can be applied to any variables that are displayed numerically. However, they are appropriate only for truly numerical variables, not for categorical variables that have been coded numerically. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Scatterplots A scatterplot is a scatter of points, where each point denotes the values of an observation for two selected variables.

It is a graphical method for detecting relationships between two numerical variables. The two variables are often labeled generically as X and Y, so a scatterplot is sometimes called an X-Y chart. The purpose of a scatterplot is to make a relationship (or the lack of it) apparent. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Example 3.3: GolfStats.xlsx

(slide 1 of 2) Objective: To use scatterplots to search for relationships in the golf data. Solution: Data set includes an observation (stats) for each of the top 200 earners on the PGA Tour. In StatTools, designate a StatTools data set for a particular year. Next, select Scatterplot from the Summary Graphs dropdown list and then select at least one X variable and at least one Y variable. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Example 3.3:

GolfStats.xlsx (slide 2 of 2) 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Trend Lines in Scatterplots Once you have a scatterplot, Excel enables you to superimpose one of several trend lines on the scatterplot. A trend line is a line or curve that fits the scatter as well as possible.

This could be a straight line, or it could be one of several types of curves. To do this, right-click on any point in the chart, select Add Trendline, and fill out the resulting dialog box. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Scatterplot with Trend Line and Equation Superimposed 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Correlation and Covariance (slide 1 of 4) Correlation and covariance measure the strength and direction of a linear relationship between

two numerical variables. The relationship is strong if the points in a scatterplot cluster tightly around some straight line. If this straight line rises from left to right, the relationship is positive and the measures will be positive numbers. If it falls from left to right, the relationship is negative and the measures will be negative numbers. The two numerical variables must be paired variables. They must have the same number of observations, and the values for any observation should be naturally paired.

2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Correlation and Covariance (slide 2 of 4) Covariance is essentially an average of products of deviations from means. Excel has a built-in COVAR function, and StatTools also calculates covariances automatically. Covariance has a serious limitation as a descriptive measure because it is very sensitive to the units in which X and Y are measured.

2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Correlation and Covariance (slide 3 of 4) Correlation is a unitless quantity that is unaffected by the measurement scale. The correlation is always between -1 and +1. The closer it is to either of these two extremes, the closer the points in a scatterplot are to a straight line.

Excel has a built-in CORREL function, and StatTools also calculates correlations automatically. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Correlation and Covariance (slide 4 of 4) Three important points about scatterplots, correlations, and covariances: A correlation is a single-number summary of

a scatterplot. It never conveys as much information as the full scatterplot. You are usually on the lookout for large correlations, those near -1 or +1. Do not even try to interpret covariances numerically except possibly to check whether they are positive or negative. For interpretive purposes, concentrate on correlations. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Example 3.3 (Continued) GolfStats.xlsx (slide 1 of 2) Objective: To use correlations to understand

relationships in the golf data. Solution: In StatTools, create a table of correlations by selecting Correlation and Covariance from the Summary Statistics dropdown list. Fill in the resulting dialog box and check Correlations. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Example 3.3 (Continued) GolfStats.xlsx (slide 2 of 2) You can learn more about a correlation by creating the corresponding scatterplot. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Pivot Tables

The pivot table is an Excel tool that allows you to break data down by categories. Sometimes pivot tables are used to display tables of counts, often called crosstabs or contingency tables. However, crosstabs typically list only counts, whereas pivot tables can list counts, sums, averages, and other summary measures. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Example 3.4:

Elecmart Sales.xlsx (slide 1 of 2) Objective: To use pivot tables to break down the customer order data by a number of categorical variables. Solution: Data set contains data on 400 customer orders during several months for Elecmart company. Create a pivot table by clicking the PivotTable button on the Insert ribbon. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Example 3.4: Elecmart Sales.xlsx

(slide 2 of 2) 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Hiding Categories (Filtering) You can filter out any items in a pivot table that you dont want to see. Click the Row Labels dropdown arrow of the active field and check the items you want to filter on. A pivot table with hidden categories is shown below. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible

Sorting on Values or Categories It is easy to sort in a pivot table, either by the numbers in the Values area or by the labels in a Rows or Columns field. To sort by the numbers in the Values area, right-click any number and select Sort. To sort on the labels of a Rows or Columns field, right-click any of the categories and select Sort. You can also click the dropdown arrow for the field and get the dialog box that allows both

sorting and filtering. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Changing Locations of Fields (Pivoting) You can choose where to place variables in a pivot table. For example, to place the Region variable in the Columns area, drag the Region button from the Rows area of the PivotTable Fields pane to the Columns area. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Changing Field Settings

You can change various settings in the Field Settings dialog box. To get to this dialog box: Click the Field Setting button on the Analyze/Options ribbon. OR right-click any of the pivot table cells and select the Field Settings item. The pivot table with Value Field Settings changed to Average is shown below. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Pivot Charts

It is easy to accompany pivot tables with pivot charts. These charts adapt automatically to the underlying pivot table. To create a pivot chart, click anywhere inside the pivot table, select the PivotChart button on the Analyze/Options ribbon, and select a chart type. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Multiple Variables in the Values Area More than a single variable can be

placed in the Values area. Also, a given variable in the Values area can be summarized by more than one summarizing function. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Summarizing by Count The variable in the Values area can be summarized by the Count function. This is useful when you want to know, for example, how many of the orders were placed by females in the South. Right-click any number in the pivot table, select

Value Field Settings, and select the Count function. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Grouping Categories in a Rows or Columns variable can be grouped. Suppose you want to summarize Sum of Total Cost by Date. Starting with a blank pivot table, check both Date and Total Cost in the PivotTable Fields pane. Then right-click any date and select Group.

2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Other Pivot Table Features Showing/hiding subtotals and grand totals (check the Layout options on the Design ribbon)

Dealing with blank rows, that is, categories with no data (right-click any number, choose PivotTable Options, and check the options on the Layout & Format tab) Displaying the data behind a given number in a pivot table (doubleclick any number in the Values area to get a new worksheet) Formatting a pivot table with various styles (check the style options on the Design ribbon) Moving or renaming pivot tables (check the PivotTable and Action groups on the Analyze/Options ribbon) Refreshing pivot tables as the underlying data changes (check the Refresh dropdown list on the Analyze/Options ribbon) Creating pivot table formulas for calculated fields or calculated items (check the Formulas dropdown list on the Analyze/Options ribbon) 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Example 3.5: Lasagna Triers.xlsx

(slide 1 of 2) Objective: To use pivot tables to explore which demographic variables help to distinguish lasagna triers from nontriers. Solution: Data set contains data on over 800 potential customers being tracked by a frozen lasagna company. Set up a pivot table that shows counts of triers and nontriers for different categories of the variables. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Example 3.5: Lasagna Triers.xlsx (slide 2 of 2)

Pivot Table and Pivot Chart for Examining the Effect of Gender 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Slicers and Timelines In Excel 2010, Microsoft added slicers lists of the distinct values of any variable, which you can then filter on. You add a slicer from the Analyze/Options ribbon under PivotTable Tools. In Excel 2013, a Timeline feature was

added. A Timeline is like a slicer, but it is specifically for filtering on a date variable. 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible Pivot Table with Slicers and a Timeline 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible

Recently Viewed Presentations

  • Laryngeal Anatomy and Physiology

    Laryngeal Anatomy and Physiology

    Sub-Glottal Pressure Bernoulli Effect - set vocal folds into vibration due to the elasticity of the folds (elastic recoil - the force which restores any elastic body back to its resting place) Muscular Force - Muscles act to bring the...
  • Liquids and Solids - ctlsfasu

    Liquids and Solids - ctlsfasu

    volatile. The higher the vapor pressure, the easier it goes from a liquid to a gas. i.e. acetone has a high vapor pressure. As the temperature increases, the kinetic energy of the molecular motion becomes greater, and vapor pressure increases....
  • Out of Region Market Assumptions Resource Adequacy Technical

    Out of Region Market Assumptions Resource Adequacy Technical

    Out of Region Market Assumptions Resource Adequacy Technical Committee April 2010 California Import Capability Focus analysis on California Imports Currently, the assumption in Genesys for the adequacy assessment is 3,000 MW of hourly import capability from October through May and...
  • Pat Neff Title I Parent Meeting

    Pat Neff Title I Parent Meeting

    * Pat Neff is a Title I school because it received money from the federal government. That money is used to provide extra services to our students. * Title I money is spent on the following programs (read programs from...
  • What is Cognitive Neuroscience? - Stanford University

    What is Cognitive Neuroscience? - Stanford University

    Introspectionism (Wundt, Titchener) Thought as conscious content, but two problems: Suggestibility. Gaps. Freud suggests that mental processes are not all conscious . Behaviorism (Watson, Skinner) eschews talk of mental processes altogether.
  • Presentazione di PowerPoint

    Presentazione di PowerPoint

    Questo libro é scritto in lingua volgare, cioè nell'italiano parlato in toscana nel 1300. Decameron vuole dire "dieci giorni" ed è stato chiamato così perché narra la storia di tre ragazzi e sette ragazze che per scappare dalla peste che...
  • REGULATING THE INTERNET FRONTIER: Counseling, Supervision & Ethics

    REGULATING THE INTERNET FRONTIER: Counseling, Supervision & Ethics

    DESCRIPTION. The internet and other forms of new technology have expanded distance communication and opportunities for cross state counseling. Social media have made it possible for people to locate and read about highly personal details of our lives anywhere in...
  • Fundamentals of Defense Property Accountability System (DPAS) Warehousing

    Fundamentals of Defense Property Accountability System (DPAS) Warehousing

    Background. The Defense Property Accountability System (DPAS) program is administered by the Under Secretary of Defense for Acquisition, Technology and Logistics, a branch of the Office of the Secretary of Defense.