摘要 |
Techniques are provided for automatically discovering one or more features in free form heterogeneous data. In one aspect of the invention, the techniques include obtaining free form heterogeneous data, wherein the data comprises one or more data items, applying a label to each data item, using the labeled data to build a language model, wherein a word distribution associated with each label can be derived from the model, and using the word distribution associated with each label to discover one or more features in the data, wherein discovering one or more features in the data facilitates one or more operations that use at least a portion of the labeled data. |