摘要 |
A method and apparatus for calculating a score for word selection, which may be used to preprocess sets of words prior to a dimensionality reduction process, employs information about relationships between words themselves (such as synonym relationships) or relationships between items with which the words are associated (such as products in a catalog). In some embodiments, the relationships are also community based; i.e., the relationships are established by a community of users. The relationships may be references to two or more word sets in which the word of interest is common. In one embodiment, the word sets are descriptions of products in an online catalog, the community is the group of people who view the catalog, and the relationships used for calculating the score for a particular word of interest are coreferences (e.g., viewing or purchasing) of pairs of products for which the catalog descriptions both include the particular word.
|