发明名称 |
Method of text similarity measurement |
摘要 |
In one aspect, the present invention provides a for estimating the similarity between at least two portions of text including the steps of forming a set of syntactic tuples, each tuple including at least two terms and a relation betweeen the two terms; classifying the relation between the terms in the tuples according to a predefined set of relations; establishing the relative agreement between syntactic tuples from the portions of text under comparison according to predefined classes of agreement; calculating a value representative of the similarity between the portions of text of each of the classes of agreement; and establishing a value for the similarity between the portions of text by calculating a weighted sum of the values representative of the similarity between the portions of text for each of the classes of agreement. Preferaly, the step of calculating a value representative of the similarity between the portions of text for each of the classes of agreement includes a weighting based upon the number of matched terms occurring in particular parts of speech in which the text occurs. It is also preferred that the step of calculating a value representative of the similarity between the portions of text for each of the classes of agreement include the application of a weighting factor to the estimate of similarity for each of the classes of agreement and the parts of speech in which matched terms occur.
|
申请公布号 |
US7346491(B2) |
申请公布日期 |
2008.03.18 |
申请号 |
US20030250746 |
申请日期 |
2003.11.26 |
申请人 |
AGENCY FOR SCIENCE, TECHNOLOGY AND RESEARCH |
发明人 |
KANAGASABAI RAJARAMAN;PAN HONG |
分类号 |
G06F17/27 |
主分类号 |
G06F17/27 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|