发明名称 Techniques for computing similarity measurements between segments representative of documents
摘要 Keyword frequency data for a plurality of document-derived segments is provided in a matrix form in which each segment is represented as a vector of dimensionality equal to the number of keywords. The matrix may be subdivided into a plurality of sub-matrices, each preferably corresponding to a non-overlapping portion of the plurality of keywords. When determining a similarity measurement between any pair of segments, at least a portion of the keyword frequency data for each sub-matrix's non-overlapping keywords are used to determine a sub-matrix dot product for the pair of segments. The resulting plurality of sub-matrix dot products are then summed together in order to provide the similarity measurement.
申请公布号 EP2128774(A1) 申请公布日期 2009.12.02
申请号 EP20090161391 申请日期 2009.05.28
申请人 ACCENTURE GLOBAL SERVICES GMBH 发明人 BOSE RANTHAM PRABHAKARA, JAGADEESH CHANDRA;NAYAK, ASHWIN;CHANDRAN, ANITHA
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址