发明名称 |
Techniques for computing similarity measurements between segments representative of documents |
摘要 |
Keyword frequency data for a plurality of document-derived segments is provided in a matrix form in which each segment is represented as a vector of dimensionality equal to the number of keywords. The matrix may be subdivided into a plurality of sub-matrices, each preferably corresponding to a non-overlapping portion of the plurality of keywords. When determining a similarity measurement between any pair of segments, at least a portion of the keyword frequency data for each sub-matrix's non-overlapping keywords are used to determine a sub-matrix dot product for the pair of segments. The resulting plurality of sub-matrix dot products are then summed together in order to provide the similarity measurement. |
申请公布号 |
EP2128774(A1) |
申请公布日期 |
2009.12.02 |
申请号 |
EP20090161391 |
申请日期 |
2009.05.28 |
申请人 |
ACCENTURE GLOBAL SERVICES GMBH |
发明人 |
BOSE RANTHAM PRABHAKARA, JAGADEESH CHANDRA;NAYAK, ASHWIN;CHANDRAN, ANITHA |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|