Income estimates for small-areas

Income estimates for small-areas

ESRC Census Development Programme Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography Why is income so important? Arguably the most direct measure of utility [Yes, I know - Money cant buy happiness] Helps target Neighbourhood renewal Helps with planning [of houses, shops, leisure facilities] Consumer marketing Tax-benefit analysis Most requested addition to 2001 Census

The 2001 Census Geography of income: Other sources of data on income Benefits data Government surveys (e.g. GHS, LFS, FES, FRS, NES) Commercially-held data [Postcode sector and postcode unit estimates] The Census Rehearsal (1999) Key Objectives Evaluation of: Extant methods for small-area income estimation New approaches Utility of non-census information

(e.g. council tax; house price; benefits data) [ Methods of imputing income band means ] Definition of income Income Wealth Gross or net income? Pre or post housing costs? Adult or Household? Household? Total Equivalised [Per capita / OECD / McClements]

Surrogates Univariate % unemployed % 2+ car households % residents in Social Classes I + II % owner-occupation Multivariate (deprivation indices) Carstairs [Unemployment, overcrowding; not owning car; head in Social Class IV or V] Townsend [Unemployment; overcrowding; not owning home; not owning car]

Breadline [not owning car; not owning house; lone parenthood; social class IV or V; illness; unemployment] DLTR Index of Multiple Deprivation 2000 Green (Wealth) [owning 2+ cars; NS-SEC I or II; High qualifications] Geodemographic SuperProfiles MOSAIC GB Profiles Model Individual income Dale (SOC2000; Economic activity; age; sex; Region] Lee (SOC2000; Economic activity] Regression (individual and/or ecological)

Household income Regression (household and/or ecological) Bramley & Smart (H/h comp.; earners; tenure; area level deprivation) The 1999 Census Rehearsal Key features full census questionnaire + INCOME Large achieved sample c. 65,000 households c. 140,000 individuals Spatially contiguous Clustered sampling strategy: 7 part districts [Excluding NI]

38 wards 650 EDs Potential problems non-response rate overall (~ 50%) income (~15%) other variables (5-20%) full responses for ~ 55 % of achieved sample [individuals and households] non-response bias Income Band

0 <60 60-119 120-199 200-299 300-479 480+ Total N All 20.8 13.2 20.5 15.5 13.3 11.3 5.5 100.0 125138

No data missing 16.9 11.0 18.7 16.1 15.9 14.6 6.8 100.0 67283 Social Class No data (1991) All missing None 28.2

25.4 I 4.0 4.9 II 18.9 21.4 III(N) 17.5 19.3 III(M) 11.5 10.8 IV 14.7 13.7 V 5.0 4.3 Army

0.2 0.2 100.0 Total 100.0 N 117010 67283 Correlation coefficient Indicators (calculated for Rehearsal sub-set 1991 Enumeration districts) Original Ideal Townsend index 0.82 0.79 % households with No car

2+ cars 0.89 0.87 0.86 0.83 0.90 0.94 0.98 0.93 0.92 0.87 0.92 0.97 0.94 0.92

0.57 0.58 0.55 0.56 % households Owner-occupied Social rented Detached Detached or semi Flats % of economically active Unemployed Social Class I+II Banding of income question What is your total current gross income from all sources?

Per week or Nil _ Nil Less than 60 60 to 119 _ 120 to 199 200 to 299 300 to 479 480 or more Per year (approximately) _ Less than 3,000 3,000 to 5,999 _ 6,000 to 9,999 _ 10,000 to 14,999 _ 15,000 to 24,999

_ 25,000 or more Only 10% of adults in top band but problem compounded when individual incomes aggregated to estimate household income band mid-point band mean value of band means area sensitive? Income band () 0 1-60 61-120 121-200 201-300 301-480 481+

Council Tax National Average Band A Band H Income band mean 0 0 0 34 35 27 91 93 86 156 155 156 245 241

242 375 364 391 765 652 1353 Source: FRS 1998/9 (Crown Copyright) Digression: modelling income band means Alternative modelling strategies include: National mean Sub-group mean (e.g. by council tax band) Statistical distributions (log-normal; pareto) New variant of log-normal approach with addition of modelled median etc. Results For all bands sub-group mean best

if possible For closed-bands, national mean is next best For open (top) band, new proposed lognormal approach is best, particularly where there is evidence of strong spatial clustering Results of modelling top income band Sample size 1000 Imputation method Sub-group mean National mean Log-normal (%) Pareto (2-point) Pareto (x-point) Average over Socio-economic strata Spatial strata (cars; council tax)

(region) Error Error sq. Error Error sq. 14 7437 -5 5062 7 -53 51933 7373 12 12847 5820 -18 149 59261 157

35923 211 83351 301 100911 [Same ranking applies for samples sizes from 15000 to 150] Results of modelling top income band Sample size 20 Imputation method Sub-group mean National mean Log-normal (%) Pareto (2-point) Pareto (x-point) Average over Socio-economic strata

Spatial strata (cars; council tax) (region) Error Error sq. Error Error sq. 9 175501 -2 347994 6 352234 -101 232754 30 208267 41 375803 185

260801 221 443589 252 286535 324 486772 Spatial scale At what scale does income vary most? MAUP 1991 vs 1998/9 boundaries zones with <10 households or 25 residents excluded from analysis SOC 2000 / NS-SEC Lack of alternative SOC2000 coded data Therefore have to use Census Rehearsal data Use partitioned data to avoid unduly

advantaging SOC2000 based approaches Results Census Rehearsal Income Distribution 30 25 20 15 10 5 0 Nil <3 3-5 6-9

10-14 15-24 Annual Gross Income ( 000s) 25+ Heterogeneity rules OK! At ward level the % household reps. in top income-band averaged 9.1% but ranged from 2.8% to 21.6% 89% of EDs contained one or more household reps. in top income-band i.e. in top income-decile of the population Income distribution of household representative (Person 1 on Census Rehearsal form)

All EDs EDs in lowest income quintile 16000 4000 12000 3000 8000 2000 4000 1000

0 0 EDs in second income quintile EDs in middle income quintile 4000 4000 3000 3000 2000

2000 1000 1000 0 0 EDs in fourth income quintile EDs in top income quintile 3000 5000

4000 2000 3000 2000 1000 1000 0 0 Nil <3 3-5 6-9

10-14 15-24 Income bracket (000 p.a.) 25+ Nil <3 3-5 6-9 10-14 15-24 Income bracket (000 p.a.) 25+

Missing data Missing data have minimal impact on results From Raw to Ideal data, most correlations change by <0.02 Very few values change by >0.05 Exception is NS-SEC 8 [by definition!] Correlations lower for Ideal than Raw Surrogates calculated direct from Rehearsal circumvents data response bias? Scale Higher correlations at higher geographies District effect small but significant BUT none of districts in SE England Overfitting No significant impact

MAUP Correlations vary by up to 0.1 between alternative boundaries at same spatial scale BUT No detectable effect on rankings of surrogate income measures Adult income (r2) Surrogate Ward ED Univariate NS-SEC 1+II 0.81 0.81 Multivariate Townsend 0.36 0.46 Green (wealth) 0.57 0.55 Geodemographic

PCA_96 Na 0.82 Voas 0.83 0.59 Model Dale 0.91 0.89 Lee 0.90 0.87 Voas (individual) 0.91 0.80 [See final slide for definition of surrogates] Postcode 0.64 0.38 0.50 0.69 0.48 0.90

0.88 0.83 Regression model (adults) Age, Age2, sex, ethnicity, marital status Type and tenure of dwelling Qualifications Economic (in)activity and health Mean SOC2000 and SIC2000 income Supervisory status District of residence

Caveats Best performing surrogates in danger of over-fitting? For Dale, Lee and Voas mean occupational income calculated directly from Census Rehearsal dataset (no other SOC2000 sources available at time of analysis) BUT No significant difference if SOC minor or unit codes used No significant difference if data partitioned Household income (r2) Surrogate Ward ED Univariate NS-SEC 1+II 0.82 0.81 Multivariate

Townsend 0.48 0.46 Green (wealth) 0.61 0.50 Geodemographic PCA_96 na 0.81 Voas 0.81 0.60 Model Dale 0.90 0.85 Lee 0.87 0.83 Voas (household) 0.76 0.74 [See final slide for definition of surrogates] Postcode 0.64

0.44 0.56 0.67 0.48 0.86 0.83 0.74 Accuracy For many purposes relative, rather than absolute, accuracy is most important ranking a) NS-SEC based income surrogate [NSSEC12] % of economically active in NSSEC 1+2 100% 75%

50% 25% 0% 0 100 200 300 400 500 Observed mean individual income ( week)

600 b) Regression based estimate [VOASIND] Predicted mean individual income ( week) 600 500 400 300 200 100 0 0

100 200 300 400 500 Observed mean individual income ( week) 600 c) Sub-group mean based estimate [LEINCM] Predicted mean individual income ( week) 600

500 400 300 200 100 0 0 100 200 300

400 500 Observed mean individual income ( week) 600 Surrogate/Estimate % NSSEC Individual Sub-group Ecological 1+2 Regression mean Regression [NSSEC12] [VOASIND] [LEEINCM] % ranked in same decile as income Decile [low income] 1

2 3 4 5 6 7 8 9 [high income] 10 71 46 32 32 25 17 26 23 28 55

66 34 40 26 34 28 28 35 51 77 74 40 35 37 39 45 43 48

57 82 80 52 43 40 37 30 31 46 60 82 Overall 36 42

50 46 Within 1 decile 82 84 89 92 Other data sources < 1% of unexplained spatial variation in income attributable to area level effects House price has no significant impact could be due to data problems

Council tax band has small but significant effect [for areas of enumeration district size and below] Lack of utility counter-intuitive? current value purchase price purchase income current income Conclusions (I) Best approaches capture 80-90% of spatial variation in income, even for smallest spatial units But considerable within-area heterogeneity Best approaches are regression or subgroup mean based Conventional deprivation indices a poor second to % social class / NS-SEC I+II Conclusions (II) Geodemographic classifications at best perform as well as % NS-SEC I+II, and perform best for areas of ward size and

above Qualified support for use of statistical distributions in modelling top income band means Implications Moral for marketers: Target people, not places Moral for policy makers: Deprivation indices not the best proxy for income ONS ward income estimates (based on ecological regression) likely to perform well Longer term Consider external correlates (e.g. IMD 2000; benefits data) Lobby for Census Office to create smallarea income estimate by imputing income on Census microdata

include non-census information (?) Acknowledgements House price data were taken from the Experan Limited Postal Sector Data, ESRC/JISC Agreement. Grateful thanks are due to the Census Custodians of England, Wales and Scotland for granting permission to access the Census Rehearsal dataset. A debt of gratitude is also owed to a number at the Office for National Statistics, in particular Keith Whitfield and Philip Clarke. Finally, thanks are due to David Voas for undertaking some of the preparatory work for this project. All analyses and conclusions remain the sole responsibility of the Dr Paul Williamson. Definitions (I) NS-SEC I+II: % persons aged 16-74 in NS-SEC I or II Townsend: Multiple deprivation indicator based on % economically active unemployed; % overcrowded households; % households with no car and % of households not owner occupied

Green (Wealth): Affluence indicator based on % households with 2+ cars; % persons aged 16-74 in NS-SEC I and % adults with high educational qualifications PCA_96: Geodemographic classification based on principal components analysis of 20 normalised census variables, individuals in each of 96 area types assumed to have mean income of all persons in area type Voas: Alternative geodemographic classification, in which five census variables are divided into above or below median, one variable into thirds; with all cross-tabulated to give a total of 96 discrete area types Definitions (II) Dale: Income imputed given mean income for population sub-group defined by sex, SOC 2000 minor group, economic activity (missing; employed full-time; employed part-time; self-employed; other), age (missing; 0-15; 16-19; 20-29; 30-49; 50+) [Maximum of 4860 valid sub-groups] Lee: Income imputed given mean income for population sub-group defined by SOC 2000 minor group, economic activity (child; not applicable; employed full-time; employed part-time; self-employed;

unemployed; retired; other inactive) [maximum of 649 valid subgroups] Definitions (III) Voas (individual): Regression model for adult income (children assumed to have 0 income); INCOME0.5 predicted given: mean income by SOC2000 unit; mean income by Industry category, age, age 2, residents, residents2, rooms and cars plus dummy variables for sex, white, full-time student, married, Single/Widowed/Divorced, Longterm ill, No qualifications, GCSE or equivalent, A levels or equivalent, Undergraduate degree or equivalent, employed full-time, employed part-time, self-employed, unemployed, retired, permanently sick, other economically inactive excluding pensioners and students, Semi-detached, terrace, flat, caravan, privately rented, social rented, employed manager or supervisor and district of residence Voas (household): Regression model for total household income; HHINC0.5 predicted given same set of predictors as for Voas (individual), but based only upon head of households characteristics

Recently Viewed Presentations

  • Transformed People, Transforming Lives Locations Nyack in New

    Transformed People, Transforming Lives Locations Nyack in New

    Our program makes it easy for career changers to get their initial certification If you meet the prerequisite Liberal Arts and Science requirements, pass the LAST, and complete the other admissions steps, you will be able to enroll Degrees Offered...
  • Glencoe Algebra 2

    Glencoe Algebra 2

    Use a determinant to find the area of the triangle. Area Formula Example 3 Use Determinants Diagonal Rule Sum of products of diagonals 0 + (-3) + 4 = 1 -18 + 0 + 2 = -16 Example 3 Use...
  • Quantum Physics 1

    Quantum Physics 1

    Quantum physics (quantum theory, quantum mechanics) Part 1:
  • CS267: Introduction

    CS267: Introduction

    See also: Anthony Danalis, Gabriel Marin, Collin McCurdy, Jeremy S. Meredith, Philip C. Roth, Kyle Spafford, Vinod Tipparaju, and Jeffrey S. Vetter. 2010. The Scalable Heterogeneous Computing (SHOC) benchmark suite. In Proceedings of the 3rd Workshop on General-Purpose Computation on...
  • CSCI 210 Data Structures &amp; Algorithms

    CSCI 210 Data Structures & Algorithms

    The Heap Array. The binary heap data structure supports both insertion and extract-min in . O(log n) time each. The minimum key is always at the root of the heap. New keys can be inserted by placing them at an...
  • THE OFFICE CHALLENGE - Makerere University

    THE OFFICE CHALLENGE - Makerere University

    I have two supervisors, Dr Farai Nyabadza and Dr Aziz Ouhinou I would like to explain the main reason force me to work in this field. My county, Tanzania, is among 22 countries of high burden of tuberculosis (TB) incidence...
  • MOOD DISORDERS Chapter F1 Anxiety Disorders in Children

    MOOD DISORDERS Chapter F1 Anxiety Disorders in Children

    As described above, it is important to determine the basic motivation behind particular behaviours in order to identify the relevant diagnosis. For example, young children who have a tantrum when their parents plan to go out may be doing so...
  • Chapter 7 Chemical Formulas and Chemical Compounds

    Chapter 7 Chemical Formulas and Chemical Compounds

    The ratio of ions is not indicated in the name…it is assumed you will know (or be able to figure it out). Remember… cations come first . in ionic compounds and naming is the same as for monatomic ions. The...