Data Preparation for Text

Before text can be mined, it must undergo a special preprocessing step known as term extraction or feature extraction. This process breaks the text down into units (terms) that can be mined. Text terms may be keywords or other document-derived features.

Text preparation in Oracle Data Miner uses a Build Text node to transform text columns. Build Text does not support HTML or XML documents; it also does not support any binary data types.

Oracle Data Miner uses the facilities of Oracle Text to preprocess text columns.