Text Mining

Much of today's enterprise information includes both structured and unstructured content. Customer account data may include text fields that describe support calls and other interactions with the customer. Insurance claim data may include a claim status description, supporting documents, email correspondence, and other information. Patient data may include comments by doctors and nurses. Analytic applications often must evaluate the structured information together with the related text.

Data Mining allows you to mine data sets that contain regular relational information (numeric and character columns), as well as one or more text columns.

This cue card set show you how to include customer comments collected during a survey to better identify good customers.

Text must undergo a transformation process before it can be mined. The text transformation extracts meaning from the text. Once the data has been properly transformed, the data can be used for building, testing, or scoring data mining models.

Most Oracle Data Mining algorithms support text. However, O-Cluster and Decision Tree don't support text. If you use Data Miner to build a model that contains a text column, Data Miner will ignore the column (not use it for input) if the algorithm doesn't support text.

Data Miner provides Text nodes that transform text data so that it can be used to build and apply models using text as input. Data Miner uses Oracle Text to transform text data.

To build models that use text columns, create a workflow in an existing project. In the workflow create a data source node using data than has one or more columns of text data. Use a Build Text node to prepare the text columns and then create models as usual. When you apply a model, use an Apply Text node to prepare the text columns in the same way as they were prepared for building.

cue cards icon Launch the cue cards to learn how to use the Text Nodes.