发明名称 |
METHOD, APPARATUS, SYSTEM AND STORAGE MEDIUM HAVING COMPUTER EXECUTABLE INSTRUTIONS FOR DETERMINATION OF A MEASURE OF SIMILARITY AND PROCESSING OF DOCUMENTS |
摘要 |
A method determines a measure of similarity between a first document and a second document, in which a vector space model which takes into account word frequencies and coordinates is determined for the first document and for the second document. A measure of the similarity between the first document and the second document is determined using the vector space model. An apparatus, a computer program product and a storage medium are configured to execute the method. |
申请公布号 |
US2014181124(A1) |
申请公布日期 |
2014.06.26 |
申请号 |
US201314138407 |
申请日期 |
2013.12.23 |
申请人 |
DOCUWARE GMBH |
发明人 |
HOFMEIER Andreas;WEIDLING Christoph;BERGER Michael |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
1. A method for determining a measure of similarity between a first document and a second document, which comprises the steps of:
determining a vector space model which takes into account word frequencies and coordinates for the first document and for the second document; determining the measure of similarity between the first document and the second document using the vector space model; determining a respective word vector for the first document and for the second document, elements of word vectors indicating whether or not a word occurs in a respective document; determining a respective coordinate vector the first document and for the second document, elements of coordinate vectors indicating coordinates for words which occur together in the first and second documents; and comparing the words which repeatedly occur in both the first and second documents with one another in the respective coordinate vector. |
地址 |
Germering DE |