发明名称 |
Method and apparatus for characterizing documents based on clusters of related words |
摘要 |
One embodiment of the present invention provides a system characterizes a document with respect to clusters of conceptually related words. Upon receiving a document containing a set of words, the system selects "candidate clusters" of conceptually related words that are related to the set of words. These candidate clusters are selected using a model that explains how sets of words are generated from clusters of conceptually related words. Next, the system constructs a set of components to characterize the document, wherein the set of components includes components for candidate clusters. Each component in the set of components indicates a degree to which a corresponding candidate cluster is related to the set of words.
|
申请公布号 |
US2004068697(A1) |
申请公布日期 |
2004.04.08 |
申请号 |
US20030676571 |
申请日期 |
2003.09.30 |
申请人 |
HARIK GEORGES;SHAZEER NOAM M. |
发明人 |
HARIK GEORGES;SHAZEER NOAM M. |
分类号 |
G06F17/30;(IPC1-7):G06F17/00;G06F7/00 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|