Explore Data Quality

tell me more icon Tell me more

Filter Columns nodes measure data quality. In this example, use Filter Columns to determine which columns satisfy the default quality measures.

  1. Create another Filter Columns node:

    1. Click Transforms and select Filter Columns.
    2. Move the mouse to the place in the workflow Transformations where you want to place the node. Click.
    3. Name the node FilterDQ.
    4. Right-click MINING_DATA_BUILD_V, and select Connect. Connect the data source node to FilterDQ.

  2. Double-click the node FilterDQ to open Edit Column Filter Node; click Settings.
    tell me more icon Tell me more about what I see in the GUI

  3. Leave the settings as they are. Click OK to close the settings dialog; click OK again to close the edit dialog.

  4. Right-click FilterDQ, and select Run.

  5. After execution completes (the node has a green check mark), double-click the node again to view the hints. The column CUST_ID is more than 95% unique, that is, more than 95% of the values in the column are unique. OS_DOC_SET_KANJI and PRINTER_SUPPLIES are more than 95% constant. To see the exact percent constant and other details, click edit selection.

You must decide which columns to ignore in subsequent nodes (that is, which columns are not passed on). To ignore a column, click output indicator in the Output column; this changes the entry in the Output column to ignore field. You may wish to ignore OS_DOC_SET_KANJI and PRINTER_SUPPLIES.

tell me more icon Tell me more about what I see in the GUI

The cue cards icon Next step is to use a Transform Node to bin data.