Using SAS to Develop Internal Fraud Detec tion

Using SAS to Develop Internal Fraud Detec tion

Using SAS to Develop Internal Fraud Detec tion Strategies Presented by Komla Ahlijah GAUSS FALL MEETING November 9,2016 1 1 Presentation Overv iew What is credit card fraud? Impact on Banks, consumers and Popular fraud detection strategies/ Reasons for leveraging internal data Why are logistic regression and Variable creations and construction modeling. Variable Selections. merchants. models to develop customer focused strategies. decision trees the right solutions? of the dynamic historical data needed for Proc Varclus Proc reg Proc HPbin Creating business rules. Proc HPSplit 2

What is credit card fraud? Credit Card Fraud is a broad term used to describe the unauthorized use of an indivi duals credit card or card information to make purchas es(or to remove funds from the cardholders account). It is a form of Identity Theft that occurs in several ways: o A person fraudulently obtains, takes, signs, uses, sells, buys, or forges someone else's credit or debit card or card information. This is commonly referred to as account takeover. o An impostor opens credit card accounts in another person's name. This is usually referred to as application fraud. o A collusive merchant sells goods or services to someone else with knowledge that the credit or debit card being used was illegally obtained or is being used 3 without authorization. This is referred to as merchant fraud. Impact on Banks, Consu mers & Merchants. Impact on Banks: Due to Visa and MasterCard Rules, Banks (Issuer/Acquirer) have to initially bear the costs of fraud before seeking reimbursement from the merchant via chargeback. Furthermore there are several administrative and manpower costs associated with filing a chargeback that the bank has to incur. Impact on Merchants: Merchants are the most affected party in a credit card fraud, particularly more in the card-notpresent transactions, as they have to accept full liability for losses due to fraud. Impact on Consumers: Consumers are the least financially impacted due to fraud in credit card transactions because consumer liability is limited for credit card transaction legislation and banks cardholder protection policies. But in cases of Application Fraud the impact on the victims credit may be devastating and the process to correct the bureau data may be lengthy.

4 Popular fraud detection strategies/models The most popular detection plat form used is the Falcon Fraud Manager by FICO which uses global cardh older profiles and focuses on high risk entities such as ATMs and merch ants. According to FICO, the Falcon Score improves detection by up to 44% with global profiles and provides the ability to control fraud at the card account and customer level. Another widely used credit card activity scoring model is the VAA (Visa Advanced Authorization) which provides real time scores, monitoring, and evaluations of transactions on Visa cards. The VAA Score indicates the probability of fraud risk asso ciated with a given transaction. 5 Reasons for leveraging in ternal data to develop customer focused strategies. Banks have historical spending data on each customer. This data can be leveraged to create a customer centric cardholder profile for each individual client. These custom profiles enable banks to determine out of pattern activities for each cardholder and set risk tolerance level for each individual account (i.e. What suspicious transaction dollar amounts should trigger a real time decline, a text message or a call to the customer?) Banks can factor in customer travel alerts and make exceptions for abnormal activity when the customer is out of town for business or on a family vacation. The internal strategies can easily be fine- tuned, changed or adjusted depending on confirmed changes in the cardholders purchasing habits. 6 Why are logistic regression and decision trees the right solutions? The outcome we are attempting to predict is categorical and binomial and all of the predictor variables are either continuous and/or categorical. We need to leverage the relative importance of each predictor variable as well as any potential interaction among them. Logistic Regression is suitable for leveraging the predictive power

of each input variable and/or their combinations. Decision trees are ideal for creating business rules to further supplement the performance of the models. 7 Variable creations and construction of the dynamic historical data needed for modeling. Macro variable to loop through each day of a given period %LET START="01OCT2016"D; %LET END="31OCT2016"D; %LET DIF=%SYSFUNC(INTCK(DAY,&START,&END)); %macro data_pull; %DO i=0 %TO &DIF; PROC SQL;CREATE TABLE FRAUD.RAWDATA AS SELECT KEY_VAR,VAR_1,...,VAR_n FROM ALLCARD.ALLDATA WHERE DATA_DT ="%SYSFUNC(PUTN(%SYSFUNC(INTNX(DAY,&START,&i, QUIT; PROC SQL;CREATE TABLE FRAUD.RAWDATA1 AS SELECT DISTINCT KEY_VAR,VAR_1,...,VAR_n FROM ALLCARD.ALLDATA WHERE "%SYSFUNC(PUTN(%SYSFUNC(INTNX(DAY,&START,&i,b)),DATE9.))"D-90<= QUIT; /* USE VARIOUS AGGREGATION TECHNIQUES AND CALCULATIONS TO EXTRACT %end; %mend; %data_pull; Daily customer activity pull b)),DATE9.))"D; Historical activity pull DATA_DT< "%SYSFUNC(PUTN(%SYSFUNC(INTNX(DAY,&START,&i,b)),DATE9.))"D; POTENTIALLY PREDICTIVE METRICS*/ 8 Handling missing numeric values prior to variable selections.

Replace missing numeric values by 0 DATA DATASET2; SET DATASET1; ARRAY MYNUM{*} _NUMERIC_; DO i=1 TO DIM(MYNUM); IF MYNUM{i}=. THEN END; DROP i; RUN; MYNUM{i}=0; 9 Variable Selections - PROC VARCLUS Eigenvalue is a measure of variance in a principal component. If the eigenvalue of the second principal component is greater than the specified cut-off in the MAXEIGEN option, the cluster has more than one dimension and hence would be split further. As a best practice, the MAXEIGEN option is usually set to 0.7. PROC VARCLUS DATA=TRAINING_DATA MAXEIGEN= VAR CUST_VAR_1 CUST_VAR_2 CUST_VAR_3 . . . CUST_VAR_n; RUN; The MAXCLUSTERS option is used to specify the desired number of clusters . 0.7 MAXCLUSTERS=15; 10

PROC VARCLUS SAMPLE OUTPUT The clusters are created in a way that variables from the low correlation with any other cluster. We usually select the variable with minimum R-square R Square Ratio=(1-R squared Own Cluster) / (1-R Square same cluster are highly correlated with each other but have a ratio within its own cluster. Next Cluster) 1-R**2 32 Clusters Cluster Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Variable CUST_VAR_17 CUST_VAR_58 CUST_VAR_93 CUST_VAR_118 CUST_VAR_121 CUST_VAR_140 CUST_VAR_147 CUST_VAR_149 CUST_VAR_132 CUST_VAR_134 CUST_VAR_3 CUST_VAR_128 CUST_VAR_130 CUST_VAR_131

CUST_VAR_160 CUST_VAR_6 CUST_VAR_42 CUST_VAR_82 R-squared with Own Cluster 0.9616 0.9373 0.8709 0.8034 0.8883 0.6732 0.8751 0.8751 0.9062 0.9062 0.4767 0.8191 0.6173 0.8263 0.4277 0.8255 0.744 0.5038 Next Closest 0.4525 0.453 0.3733 0.1288 0.2092 0.3309 0.1311 0.0924 0.1688 0.0781 0.1932 0.3545 0.3386 0.3737

0.2074 0.2035 0.1645 0.2185 Ratio 0.0702 0.1147 0.2059 0.2256 0.1412 0.4885 0.1438 0.1376 0.1129 0.1018 0.6487 0.2802 0.5787 0.2773 0.7221 0.2191 0.3064 0.6349 11 Variable Selections -PROC REG PROC reg DATA=TRAINING_DATA; MODEL OUTCOME =CUST_VAR_1 CUST_VAR_2 CUST_VAR_3 . . . CUST_VAR_n/ VIF TOL; RUN; The Variance Inflation Factor (VIF) represents a factor by which the variance of the estimated coefficient is multiplied (i.e. inflated due to the multi-collinearity in the model). Variables with high VIF (usually

greater than 10) are removed and the procedure is repeated until all remaining attributes have an acceptable or low level of VIF. The TOL option provides the proportion of variance in a given predictor that is NOT explained by all of the other predictors. It is the inverse of the VIF( TOL=1/VIF) 12 PROC REG SAMPLE OUTPUT This output indicated the variable of predictors and the regression should attribute. cust_var_5 should be removed for the list repeated to determine the next removable Parameter Estimates Parameter Standard Variance Variable CUST_VAR_1 DF 1 Estimate 0.0000357 Error 5.62E-07 t Value 63.51 Pr > |t|

<.0001 Tolerance 0.5167 Inflation 1.93535 CUST_VAR_2 1 -2.02E-07 1.64E-07 -1.23 0.2185 0.0764 13.08943 CUST_VAR_3 1 -0.00419 0.00070311 -5.95 <.0001 0.10368 9.64522 CUST_VAR_4 1

0.00000122 7.37E-07 1.65 0.0985 0.2761 3.62191 CUST_VAR_5 1 -0.0000847 0.000018 -4.71 <.0001 0.02902 34.45383 CUST_VAR_6 1 0.00604 0.00093708 6.45 <.0001 0.06389

15.65078 CUST_VAR_7 1 -0.00000186 7.56E-07 -2.46 0.0138 0.26194 3.81762 CUST_VAR_8 1 -1.53E-08 6.16E-08 -0.25 0.8045 0.08636 11.57894 13 Variable Selections - PROC HPBin The weight of evidence (WOE) option computes the Weight of Evidence and Information Values for all binning variables. proc hpbin data=TRAINING_DATA

input CUST_VAR_1 CUST_VAR_2 CUST_VAR_3 . . . CUST_VAR_n; target FLAG; run; Specifies the global number of bins for all binning variables WOE numbin=2; The target is needed to specify the outcome that the modeler is trying to predict. 14 PROC HPBin Variable Information Value Information Variable Value CUST_VAR_1 0.13258718 CUST_VAR_10 0.13398693 CUST_VAR_11 0.28780739 CUST_VAR_12 0.281949 CUST_VAR_13 0.18701811 CUST_VAR_14 0.4593319 CUST_VAR_15 0.4703598

CUST_VAR_16 0.37497253 CUST_VAR_17 0.46069592 CUST_VAR_18 0.34096933 CUST_VAR_19 0.11523774 CUST_VAR_2 0.10748545 CUST_VAR_20 0.14582539 CUST_VAR_3 0.19573208 CUST_VAR_4 0.10599385 CUST_VAR_5 0.18824251 CUST_VAR_6 0.27428295 CUST_VAR_7 0.11994695 CUST_VAR_8 0.15874959 CUST_VAR_9 0.15920715 SAMPLE OUTPUT The weight of evidence provides the predictive power of an independent variable in relation to the dependent variable. The information value (IV) is a useful technique to select important variables in a predictive model. It helps to rank variables on the basis of their importance. The good predictors have IV between 0.3 and 0.5. Weight of Evidence Variable CUST_VAR_1

CUST_VAR_2 CUST_VAR_3 CUST_VAR_4 CUST_VAR_5 Range CUST_VAR_7 < 0.5 0.5 <= CUST_VAR_7 Non-event Non-event Count Rate 1907697 0.99855 117705 0.995231 Event Count 2770 564 Event Rate 0.00145 0.004769 Weight of Evidence 0.12545402 -1.0684682 Information Value 0.01393189 0.11865529 CUST_VAR_9 < 213 213 <= CUST_VAR_9 CUST_VAR_17 < 0.5 0.5 <= CUST_VAR_17

2023214 2188 1073242 952160 0.998408 0.952962 0.997705 0.999092 3226 108 2469 865 0.001592 0.047038 0.002295 0.000908 0.03184894 -3.4007388 -0.3347247 0.59440817 0.00099729 0.10648816 0.07051344 0.12521864 CUST_VAR_47 < 0.5 0.5 <= CUST_VAR_47 CUST_VAR_62 < 0.5 0.5 <= CUST_VAR_62 1930615 94787 1027836 997566 0.998512 0.995202 0.99768

0.999055 2877 457 2390 944 0.001488 0.004798 0.00232 0.000945 0.09949511 -1.0746465 -0.3454332 0.55359673 0.00898177 0.09701208 0.07232819 0.11591432 15 Creating business rules: Proc HPSPLIT The maxdepth option allows the user to specify the maximum depth of the tree. ods graphics on; PROC HPSPLIT DATA=TT2 maxdepth=20 plots= CLASS FLAG; MODEL FLAG=CUST_VAR_1 CUST_VAR_2 CUST_VAR_3 . . . CUST_VAR_n; grow entropy; prune costcomplexity (LEAVES=30); rules file= "rules.txt";

RUN; Plots = zoomedtree option is used to request a detailed diagram of the subtree starting at any given node and ending at a given depth. zoomedtree(nodes=('0') depth=3); Entropy is a measure of disorder or impurity. This statement uses the gain in information (decrease in entropy) to split each variable and then to determine the best split. The PRUNE statement requests cost-complexity pruning to select a smaller sub-tree that avoids over fitting the data. This statement automatically generates a text file with the rule syntax at each node. 16 PROC HPSPLIT SAMPLE OUTPUT The diagram below details the tree that results from using the option plots=zoomedtree(nodes=('0') depth=3) 17 PROC HPSPLIT SAMPLE OUTPUT Model-Based Confusion Matrix Predicted Error Actual N N 3294 Y

25 Rate 0.0075 Y 124 785 0.1364 The output also produces the confusion Matrix and well as the ROC Curve for the tree. 18 Refer ences credit-card-fraud/ criminal-charges/creditdebit-card-fraud.html de/8140?file=5758 nt /security/layers-of-security/advanced-authoriz ation.jsp 19

Recently Viewed Presentations

  • Psychopharmacology in the Medically Ill ACLP Resident Education

    Psychopharmacology in the Medically Ill ACLP Resident Education

    Phase II conjugation generally renders substances that have already undergone phase I oxidation more hydrophilic and more readily excretable. Contribution of phase II metabolism to drug-drug interactions is typically less significant BUT metabolism of lamotrigine, olanzapine, and many narcotics is...
  • C.S.E.T. The way to make your writing in

    C.S.E.T. The way to make your writing in

    is a movie you need to see right now. The film bends the audience's view of reality.There is a crazy scene where they enter one dream that is collapsing and the hallway is spinning during a fight scene.During this scene...
  • INSE 691A -

    INSE 691A -

    Stuxnet's attack strategy. 3rd party (e.g., contractor) organization's network machine with Siemens Step 7 PLC. The degree of software diversity along potential attack paths can be considered a good metric for the network's capability of resisting Stuxnet
  • Software for Protein Structures by NMR Software Can

    Software for Protein Structures by NMR Software Can

    NMRPipe, Felix NMR data analysis/visualization NMRDraw, NMRView, PIPP Iterative Relaxation Matrix Calculations IRMA, CORMA, MARDIGRAS, XPLOR, MORASS, etc Automated NMR Analysis AutoAssign, AutoStructure, ARIA, CANDID, GARRANT, etc Not A complete List of Software New software is constantly being developed
  • Annotated Bibliography Guidelines

    Annotated Bibliography Guidelines

    Annotated Bibliography= A list of sources (books, articles, web pages, etc.) on your topic, with commentary on each source written by you. This commentary might summarize what the source is about, how it relates to your topic, which parts are...
  • Knots and Lashings -

    Knots and Lashings -

    Bowline Securing lines that are not meant to slip, such as rescue lines Square Knot Used for connecting 2 ropes together Sheepshank Used for strengthening a weakened section of rope Sheet Bend Used for connecting 2 ropes together Clove Hitch...
  • Jane Austen&#x27;s Pride and Prejudice - Weebly

    Jane Austen's Pride and Prejudice - Weebly

    Jane Austen's Pride and Prejudice Regency Period Middle class gained social status; known as landed gentry Profits from Industrial Revolution and expanding colonial system Strived to align themselves with England's landed aristocracy Purchased estates and country homes to rival aristocratic...
  • Making a GUI Test-first - Testing Education

    Making a GUI Test-first - Testing Education

    Making a GUI Test-first 9/30/04 Testing 2 Testing a GUI Need to simulate mouse clicks, key presses, human uncertainty GUIs subject to change rapidly Potentially difficult to isolate GUI problems from logic problems TDD'ing a GUI Separate presentation layer from...