Home | Title Page | Introduction | Background | SEMMA | CRISP-DM | Concerns | Future | Conclusion | References |
The second widely used data mining approach is CRISP-DM (CRoss-Industry Standard Process for Data Mining). The CRISP-DM data mining process is as follows:
In the Business Understanding phase, the basic questions to be answered are defined. In the Data Understanding phase a study is done to examine what data is available and how it can be mined. Chang examined demographic, academic and communications activity data. Fifteen predictor variables (high school GPA, gender, ethnicity, etc.) and one outcome variable (enrollment status) were chosen. Data Preparation includes cleansing, combining and transforming the data.
In the Modeling phase, various models are implemented, and in the Evaluate phase the models are tested for validity. Chang used three different modeling techniques: classification and regression tree (C&RT), neural networks, and logistic regression. Similar to the SEMMA approach, part of the data is used to build the model while the remaining part is used for the validity test. Results indicated that enrollment could be predicted to some degree. Finally, validated models are put into practice in the Deployment phase. The models were then used to predict enrollment in future years.