Predicting Student Enrollment Using Markov Chain Modeling in SAS Samantha Bradley, M.A. Applied Economics Office of Institutional Research University of North Carolina at Greensboro Office of Institutional Research The University of North Carolina at Greensboro Public, coeducational state university founded in 1891 19,922 students enrolled Fall 2017 IR aggregates, analyzes, and disseminates data in support of: Institutional planning Policy formulation Decision-making for internal/external constituents Why Enrollment Projections? IR prepares Enrollment Projections every year Headcounts by student level Student credit hours by cost category Used by UNC General Administration during decision-making about university funding Helps the university plan resource allocation Identify areas with growth potential Enrollment Data IR maintains SAS datasets of enrollment data going back to Fall 2009

150+ variables: Demographics Areas of study Degree programs Credit hours How can we leverage all this data to create the most accurate Enrollment Projections? Markov Chain Model Lets us estimate the movements of a population over time The population must be categorized into exhaustive, mutually exclusive groups or states Ex.) Freshman, Sophomore, Junior, Senior Estimates the probability of a moving from one state to another, or remaining in the same state Probabilities are arranged to create a NxN Transition Probability Matrix N is the number of unique states in the model Markov Chain Model To predict enrollment for next semester, a simple Markov Chain Model

looks like this: Number of students we have this semester in each state at time t Ft Pt Jt St x Probabilities of moving amongst each state x PFF PFP PFJ PFS PPF PPP PPJ PPS

PJF PJP PJJ PJS PSF PSP PSJ PSS = = Estimated number of students in each state next semester Ft+1 Pt+1 Jt+1 St+1 Building the Transition Probability Matrix Lets say we want to predict enrollment for next Spring. We know how many students we have in each state this Fall. We can think about this as predicting how students will move between states from this Fall to next Spring

We can use last years enrollment data to track movements from last Fall to last Spring Fall 2017 Spring 2018 Freshman Sophomore Junior Senior ? ? ? ? Fall 2016 Spring 2017 Freshman Sophomore Junior Senior Freshman Sophomore Junior Senior Building the Transition Probability Matrix We can compare our Fall 2016 headcounts in each state to our Spring 2017 headcounts in each state. Cross-tabulate Fall 2016 by Spring 2017 and calculate the row percentages: Start with student-level enrollment data

Fall 2016 Spring 2017 F F F P P P P P J J J J J S S S S S S S Spring 2017 Spring 2017 F P J S F

3 1 0 0 P 0 4 1 0 J 0 0 4 2 S 0 0 0 5

Counts F Fall 2016 F F F F P P P P P J J J J J J S S S S S Cross-tabulate Fall 2016 by Spring 2017 Fall 2016 P

J S F .75 .25 .00 .00 P .00 .80 .20 .00 J .00 .00 .66 .33 S .00 .00 .00 1.0 Percentages We can see that from Fall 2016 to Spring 2017, 75% of Freshmen remained Freshmen, while 25% of Freshmen became Sophomores. In other words, the probability of becoming a Sophomore in the Spring if you were a Freshman in the Fall is 25%.

Simple Markov Chain Model Number of students we have this semester in each state at time t Ft 5 Pt 5 Jt 8 St 6 Fall 2017 headcounts per state x x x Probabilities of moving amongst each state PFF PFP

PFJ PFS PPF PPP PPJ PPS PJF PSF PJP PSP PJJ PSJ PJS PSS 0.75 0.25 0 0 0.2 0 0

0.8 0 0 0 0 = Estimated number of students in each state next semester = Ft+1 Pt+1 Jt+1 St+1 = 4 5 6 8 0.66 0.33 0

1 Transition Probability Matrix based on state flows from Fall 2016 to Spring 2017 Predicted Spring 2018 headcounts Enhancing the Model We have so much data, we should be using it! Incorporate 5 years of historical data Build five Transition Probability Matrices for each set of historical Fall to Spring terms Average them to create a master Transition Probability Matrix Fall 2016 Spring 2017 Fall 2015 Spring 2016 Fall 2014 Spring 2015 Fall 2013 Spring 2014 Fall 2012 Spring 2013 Enhancing the Model Detailed states to track granular flows of students Concatenate multiple variables to create detailed states that are exhaustive and mutually exclusive Degree Enrollment Status Class Full-time vs Part-time ENROLL

DEGREE 0 3 4 5 8 P R Post Baccalaureate Certificate Bachelor's Master's Post Master's Certificate Unclassified Doctoral Professional Doctorate 1 2 3 4 6 New Student New Transfer Student Continuing Student Returning Student Unclassified CLASS 1 2 3 4 6 7

Freshman Sophomore Junior Senior Unclassified Undergraduate Graduate TIME F Full-time P Part-time Example: 3_1_1_F is a new freshman pursuing a bachelors degree with a full courseload this semester New Entries There are new students entering and exiting the university every semester Exits are already accounted for by using the Transition Probability Matrix New entries must be modeled separately Use our semester pairings to identify how many new students entered each Spring Flag students who were not here in Fall, but were here in Spring Our data shows that new entries are very consistent across semesters, so we can estimate future new entries using linear regression Semester New Entries Spring 2013 1566 Spring 2014

1608 Spring 2015 1623 Spring 2016 1603 Spring 2017 1722 SPRING 2018 Enhanced Markov Chain Model Number of students we have this semester in each state at time t 3_1_1_Ft 3_1_1_Pt 3_3_1_Ft Probabilities of moving amongst each state, averaged across past 5 years x

x P3_1_1_F P3_1_1_P P3_3_1_F P4_1_7_F P4_2_7_P P4_3_7_F P5_1_7_F P5_4_7_F P5_4_7_P +

+ Predicted new entries into each state 3_1_1_Fnew 3_1_1_Pnew = 3_3_1_Fnew Estimated number of students in each state next semester = 3_1_1_Ft+1 3_1_1_Pt+1 3_3_1_Ft+1 Markov Chain Modeling in SAS Efficiently process large data Combine multiple historical datasets Dynamic model

Enter term predicted, SAS does the rest Concatenate multiple variables to create detailed flow states Very large Transition Probability Matrices Easily conduct multiple kinds of analyses Regressions, crosstabulations, matrix algebra, etc. SAS Methodology Step 1 Read in the data- student level, most recent term and past 5 years Concatenate Degree, Enrollment Status, Class, and Full-time/Part-time Step 2 Create five semester pairings of Springs > Falls or Falls > Springs Step 3 Create 5 transition probability matrices for each semester pairing

Compare semester pairings to see what percentage of students in each flow state retained, dropped out, or moved to another flow state Step 4 Average across the 5 transition probability matrices to create an overall transition probability matrix Step 5 Pull in last semesters enrollment values as our baseline population Step 6 Use linear regression to model new entries Step 7 Use PROC IML to forecast enrollment for next semester! Dynamic SAS Programming Minimizes risk of user-error Simple to update Efficient SAS Macro Variables &

SAS Macro Programs only element the user changes SAS processes simple mathematics to create variables for past semesters. Given a projection term of 201801, code resolves: semester0 = 201801 semester1 = 201708 semester2 = 201701 semester3 = 201608 semester4 = 201601 semester5 = 201508 semester6 = 201501 semester7 = 201408 semester8 = 201401 semester9 = 201308 semester10 = 201301 semester11 = 201208 The CALL SYMPUT routine creates macro variables for each semester that assign the calculated semester values creating macro

variables for each student category within a PROC SQL step call the macro variables anywhere throughout the program macro program that compares semester pairs to identify new entries between first and second semester uses macro variables to determine semester pairs macro program that loops through every distinct flow state and conducts a linear regression to predict new entries into each flow state uses macro variables for each flow state PROC IML in SAS Number of students we have this semester in each

state at time t x Probabilities of moving amongst each state, averaged across past 5 years + Predicted new entries into each state = Estimated number of students in each state next semester Results Questions? You can download this presentation at: https://ire.uncg.edu/research/SRB-SAIR-2017 Contact info: Samantha Bradley [email protected] (336) 256-0399