How Do You Mine Data?

This section describes a generally-agreed-to list of steps for solving data mining problems.

Solving a problem with data mining consists of the following steps:

  1. Stating the Business Problem: Define the business problem that the data mining task will solve. This includes understanding what is required as a solution to the problem, that is, what constitutes success.

  2. Understanding the Data: Collect data and understand data used to solve the problem. Analyze data quality, and look for interesting subsets of data.

  3. Preparing the Data: Create the table used to build the model. This includes selecting the rows and columns to use, cleaning the data, and transforming the data. Data preparation is usually performed throughout the mining process.

  4. Building Models: Build models to solve the problem. This often requires building different types of models and may require more data preparation.

  5. Evaluating and Tuning Models: At this stage several good models exist. Compare the models to discover the one that best solves the business problem. It may be possible to tune the best model to improve it even more.

  6. Deploying the Results: Use the model to solve the business problem. Deployment can range from writing a report to creating an application that uses the model.

  7. Monitoring: Deployment is often not the final step. If a model is used in an ongoing process, you must monitor the model to ensure that it does not become less accurate as time passes. You may must rebuilt the model using new data or you may have to build a new model.

The data mining process is by its very nature experimental. Data mining really consists of building models, sometimes on new or modified data, until you find the model that best solves the business problem. As conditions change, the model may change, too.

The Oracle Database and Oracle Data Miner help you perform all steps in the data mining process, except for the first one.