摘要 |
One embodiment of the present invention provides a system characterizes a document with respect to clusters of conceptually related words. Upon receiving a document containing a set of words, the system selects "candidat e clusters" of conceptually related words that are related to the set of words (Figure 22, 2202). These candidate clusters are selected using a model that explains how sets of words are generated from clusters of conceptually relat ed words (Figure 22, 2204). Next, the system constructs a set of components to characterize the document, wherein the set of components includes components for candidate clusters (Figure 22, 2206). Each component in the set of components indicates a degree to which a corresponding candidate cluster is related to the set of words (Figure 22, 2208).
|