发明名称 System and method for computing a measure of similarity between documents
摘要 A measure of similarity between two documents is computed. In computing the measure of similarity, a first list of rated keywords extracted from the first document and a second list of rated keywords extracted from the second document are received. The first and second lists of keywords are used to determine whether the first document forms part of the second document using a first computed percentage indicating what percentage of keyword ratings in the first list also exist in the second list. A second percentage is computed that indicates what percentage of keyword ratings along with a set of their neighboring keyword ratings in the first list that also exist in the second list when the first percentage indicates that the first document is included in the second document. The first percentage is used to specify the measure of similarity when the second percentage is greater than the first percentage.
申请公布号 US7493322(B2) 申请公布日期 2009.02.17
申请号 US20030605631 申请日期 2003.10.15
申请人 XEROX CORPORATION 发明人 FRANCIOSA ALAIN;DANCE CHRISTOPHER R
分类号 G06F7/00;G06F17/30 主分类号 G06F7/00
代理机构 代理人
主权项
地址