Data Mining SEMMA

SEMMA

The SEMMA data mining process was developed by SAS. The steps in this process are as follows:

Sample
Explore
Modify
Model
Assess

The SAS technology that utilizes this approach is SAS Enterprise Miner. In the Sample phase, the sample must be large enough so that hidden relationships and patterns can be detected, but small enough to be manageable. In the Explore phase techniques like clustering, classification and regression look for relationships to study during the process. Anomalies and outliers are also examined. The Modify phase selects and transforms variables for the next phases. The Model phase uses various analytical tools to determine the best model for predicting outcomes. Finally, the Assess phase studies the reliability and usefulness of the results. Modifications might be necessary and some of the steps might need to be repeated.

An example of a university using the SAS Enterprise Miner approach is The University of Central Florida, Division of Graduate Studies which has used data mining as a tool in graduate admissions. After initial analysis, they selected 23 predictor variables (specific graduate program, academic level, gender, ethnic group, etc.) and one response variable (whether or not the student enrolled). They used a logistic regression model which is an appropriate model when predicting a binary response (enroll/not enroll). Half of the data was used to build the model. The remaining half was used to test the fit of the model. Each predictor was given a weight depending on the strength of its relation to the response variable. The findings indicated a valid model and they used the resulting model to predict enrollment for the fall 2007 semester.