发明名称 METHOD AND APPARATUS FOR CHARACTERIZING DOCUMENTS BASED ON CLUSTERS OF RELATED WORDS
摘要 One embodiment of the present invention provides a system characterizes a document with respect to clusters of conceptually related words. Upon receiving a document containing a set of words, the system selects "candidate clusters" of conceptually related words that are related to the set of words (2202). These candidate clusters are selected using a model that explains how sets of words are generated from clusters of conceptually related words (2204). Next, the system constructs a set of components to characterize the document, wherein the set of components includes components for candidate clusters (2206). Each component in the set of components indicates a degree to which a corresponding candidate cluster is related to the set of words (2208).
申请公布号 KR20050065578(A) 申请公布日期 2005.06.29
申请号 KR20057005832 申请日期 2003.10.03
申请人 GOOGLE, INC. 发明人 HARIK GEORGES;SHAZEER NOAM M.
分类号 G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址