Data Miner Transformations

Data Miner provides transformations that allow you to explore and modify data used to build models. For example, you can bin data or identify columns with a large proportions of NULL values and filter them out.

To use transformations, create a workflow in an existing project. In the workflow create a data source node containing the data that you wish to explore or transform. Add Transforms nodes.

Launch the cue cards to learn how to use the transformations provided by Data Miner.

Why Transform Data?

An important part of data mining problems is data gathering and preparation. Data gathering and preparation consume more than 50% of the time and effort of a data mining project.

Data Miner allows you to explore data; you can view histograms and summary statistics. This information helps you identify transformations that are required.

Data collected in the real life situations usually has imperfections, that is, there are problems with data quality. A column may include many NULL values or 95% of the values in the column may be the same. In either case, it may be appropriate to exclude the column from mining. A column may also be mostly distinct values; such a column could be some kind of identifier. It may not be appropriate to mine such a column. The Filter Columns transformation allows you to identify columns that fail data quality tests.

You can use the Filter Columns transform to identify important attributes.

It may be necessary to transform data in a variety of ways: bin it, treat missing values, handle outliers, normalize numerical data, or create custom transformations. The Transform node supports these kinds of transformations.

Data Miner supports a variety of transformations, including sample, join, and filter rows. This cue card set illustrates how to use the Transform node for binning.

cue cards icon Launch the cue cards.