Filter Columns nodes measure data quality. In this example, use Filter Columns to determine which columns satisfy the default quality measures.
Create another Filter Columns node:
FilterDQ
.MINING_DATA_BUILD_V
, and select Connect. Connect the data source node to FilterDQ
.Double-click the node FilterDQ
to open Edit Column Filter Node;
click Settings.
Tell me more about what I see in the GUI
Leave the settings as they are. Click OK to close the settings dialog; click OK again to close the edit dialog.
Right-click FilterDQ
, and select Run.
After execution completes (the node has a green check mark), double-click the node again to view the hints. The column CUST_ID
is
more than 95% unique, that is, more than 95% of the values in the column are unique. OS_DOC_SET_KANJI
and PRINTER_SUPPLIES
are more than 95% constant. To see the exact percent constant and other details,
click .
You must decide which columns to ignore in subsequent nodes (that is, which columns are not passed on). To ignore a column,
click in the
Output column; this changes the entry in the Output column to
. You may wish to ignore
OS_DOC_SET_KANJI
and PRINTER_SUPPLIES
.
Tell me more about what I see in the GUI
The
Next step is to use a Transform Node to bin data.
Copyright © 2011, 2012, Oracle. All rights reserved.