发明名称 Inferring hierarchical descriptions of a set of documents
摘要 A method automatically determines groups of words or phrases that are descriptive names of a small set of documents, as well as infers concepts in the small set of documents that are more general and more specific than the descriptive names, without any prior knowledge of the hierarchy or the concepts, in a language independent manner. The descriptive names and the concepts may not even be explicitly contained in the documents. The primary application of the invention is for searching of the World Wide Web, but the invention is not limited solely to use with the World Wide Web and may be applied to any set of documents. Classes of features are identified in order to promote understanding of a set of documents. Preferably, there are three classes of features. "Self" features or terms describe the cluster as a whole. "Parent" features or terms describe more general concepts. "Child" features or terms describe specializations of the cluster. The self features can be used as a recommended name for a cluster, while parents and children can be used to place the clusters in the space of a larger collection. Parent features suggest a more general concept, while children features suggest concepts that describe a specialization of the self feature(s). Automatic discovery of parent, self and child features is useful for several purposes including automatic labeling of web directories and improving information retrieval.
申请公布号 US2003167163(A1) 申请公布日期 2003.09.04
申请号 US20020209594 申请日期 2002.07.31
申请人 NEC RESEARCH INSTITUTE, INC. 发明人 GLOVER ERIC J.;LAWRENCE STEPHEN R.;PENNOCK DAVID M.
分类号 G06Q10/00;G06F17/30;(IPC1-7):G06F17/27 主分类号 G06Q10/00
代理机构 代理人
主权项
地址