Prepare Text Column

Tell me more.

MINING_DATA_TEXT_BUILD_V contains the attribute COMMENTS that consists of unstructured data. COMMENTS must be prepared before it is used for model building.

  1. In the Component Palette, expand Text.

  2. Click Build Text. Move the mouse to the workflow; click again.

  3. Name the new node PrepareBuild.

  4. Right-click the data source node MINING_DATA_TEXT_BUILD_V, and select Connect. Drag the line to the new node and click again.

  5. COMMENTS is the only attribute that consists of text. Follow these steps to define how to transform COMMENTS:

    1. Right-click PrepareBuild; select Edit. The Edit Build Text Node dialog opens.
    2. Select COMMENTS and click add text transform.
    3. The Add/Edit Text Transform dialog opens. Click OK to accept all of the defaults and to return to Edit Build Text Node.
    4. In Build Text Details, a new attribute COMMENTS_TOK is listed. COMMENTS has ignore attribute in the Output column; this attribute will not be passed on to subsequent nodes such as a model build node.
    5. Click OK.

  6. Right-click PrepareBuild and select Run.

  7. The Build Text node creates a new attribute COMMENTS_TOK, which consists of tokens. To view the tokens, double-click PrepareBuild and select COMMENTS_TOK in the upper pane. Note that COMMENTS_TOK has type DM_NESTED_NUMERICALS. The lower pane has two tabs. The Tokens tab lists all the tokens and their frequency. The Output tab shows the tokens in an individual comment; select a comment by number to see the tokens.
    tell me more icon Tell me more about what I see in the GUI

The text column is prepared so that you can use the data source to build models. The prepared attribute COMMENTS_TOK will be passed to the model build node, and COMMENTS will not be passed on.

tell me more icon Tell me more about what I see in the GUI

The cue cards icon Next step is to build models.