发明名称 APPARATUS, SYSTEM AND METHOD FOR APPLICATION-SPECIFIC AND CUSTOMIZABLE SEMANTIC SIMILARITY MEASUREMENT
摘要 The present invention relates to an apparatus system and method for creating a customizable and application-specific semantic similarity utility that uses a single similarity measuring algorithm with data from broad-coverage structured lexical knowledge bases (dictionaries and thesauri) and corpora (document collections). More specifically the invention includes the use of data from custom or application-specific structured lexical knowledge bases and corpora and semantic mappings from variant expressions to their canonical forms. The invention uses a combination of technologies to simplify the development of a generic semantic similarity utility; and minimize the effort and complexity of customizing the generic utility for a domain- or topic-dependent application. The invention makes customization modular and data-driven, allowing developers to create implementations at varying degrees of customization (e.g., generic, domain-level, company-level, application-level) and also as changes occur over time (e.g., when product and service mixes change).
申请公布号 US2016350283(A1) 申请公布日期 2016.12.01
申请号 US201514727451 申请日期 2015.06.01
申请人 Carus Alwin B.;DePlonty Thomas J. 发明人 Carus Alwin B.;DePlonty Thomas J.
分类号 G06F17/27;G06F17/30;G06F17/21;G06F17/22 主分类号 G06F17/27
代理机构 代理人
主权项 1. A method for application-specific and customizable text similarity measurement, the method comprising the steps of: determining a string similarity score of at least two texts based upon a string similarity database, said at least two texts comprising at least one input text and at least one target text; determining a semantic similarity score of the at least two texts based upon a semantic similarity database, the semantic similarity score being determined as the sum of a distance between at least one term of each said at least two texts; mapping said at least one target text and its respective canonical representations in a mappings database; and combining the string similarity score and the semantic similarity score of the at least two texts where the combined score is a weighted sum of the string similarity score and the semantic similarity score and where said at least two texts are ranked for similarity by sorting by their respective combined string and semantic similarity scores and where said texts that are included in the mappings database are also scored by similarity of their canonical forms.
地址 Waban MA US