摘要 |
PROBLEM TO BE SOLVED: To structure the categories of information as a binary tree with the nodes of the binary tree containing information relevant to the search, in a hierarchical classification of document. SOLUTION: The binary tree is trained or formed by examining a training set of documents and separating those documents into two child nodes. Each of those sets of documents is then further split into two nodes to create binary tree data structure. The nodes are generated to maximize the likelihood that all of the training documents are in either or both of the two child nodes. In one example, each node of the binary tree may be associated with a list of terms and each term in each list of terms is associated with a probability of that term appearing in a document given that node. New documents may be categorized by the nodes of the tree. For example, the new documents may be assigned to a particular node based upon the statistical similarity between that document and the associated node. COPYRIGHT: (C)2006,JPO&NCIPI
|