Before text data can be mined, it must undergo a special preprocessing step known as term extraction or feature extraction. The intent of the transformation is to extract meaning from the text. This process breaks the text down into units (terms) that can be mined. Text terms may be keywords or other document-derived features. Text preparation in Data Miner uses a Build Text node to transform text columns. Build Text does not support HTML or XML documents; it also does not support any binary data types. Data Miner uses the facilities of Oracle Text to pre process text columns.
MINING_DATA_TEXT_BUILD_V
contains one unstructured column COMMENTS
; use a Build Text node to prepare COMMENTS
creating a new column
COMMENTS_TOK
.
Copyright © 2011, 2012, Oracle. All rights reserved.