发明名称 System and method of merging text analysis results
摘要 A system and method of merging text analysis results. The system uses a set of three corrected, weakened Jaccard factors to determine whether the respective results of multiple text analysis operations are equal, subtypes of each other or associated with each other, in order to merge the results.
申请公布号 US9047347(B2) 申请公布日期 2015.06.02
申请号 US201313913847 申请日期 2013.06.10
申请人 SAP SE 发明人 Pfeifer Katja;Peukert Eric
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Fountainhead Law Group PC 代理人 Fountainhead Law Group PC
主权项 1. A computer-implemented method of merging text analysis results, comprising: processing, by a computer system, a plurality of text information according to a first text processing service to generate a first plurality of instances annotated according to a first taxonomy having a first set of elements; processing, by the computer system, the plurality of text information according to a second text processing service to generate a second plurality of instances annotated according to a second taxonomy having a second set of elements; calculating, by the computer system, a first coefficient between a first set of instances and a second set of instances according to a first corrected, weakened Jaccard factor, wherein the first set of instances corresponds to a first element of the first set of elements and wherein the second set of instances corresponds to a second element of the second set of elements; calculating, by the computer system, a second coefficient between the first set of instances and the second set of instances according to a second corrected, weakened Jaccard factor; calculating, by the computer system, a third coefficient between the first set of instances and the second set of instances according to a third corrected, weakened Jaccard factor; determining, by the computer system, that the first element is a subtype of the second element, that the second element is a subtype of the first element, or that the first element is associated with the second element, according to the first coefficient, the second coefficient and the third coefficient; and merging, by the computer system, the first taxonomy and the second taxonomy according to the first element and the second element being associated, the first element being the subtype of the second element, or the second element being the subtype of the first element, wherein the first corrected, weakened Jaccard factor corresponds to a Jaccard factor having a numerator and a denominator, wherein the Jaccard factor is corrected in the numerator with a first correction factor and weakened in the denominator with a first weakening factor, wherein the second corrected, weakened Jaccard factor corresponds to a ratio between a first corrected intersection size and a first corrected instance set size, wherein the first corrected intersection size is a size of an intersection of the first set of instances and the second set of instances corrected with a second correction factor, and wherein the first corrected instance set size is a size of the first set of instances deducted by a number of instances only found by the first text processing service multiplied with a second weakening factor, and wherein the third corrected, weakened Jaccard factor corresponds to a ratio between a second corrected intersection size and a second corrected instance set size, wherein the second corrected intersection size is a size of an intersection of the first set of instances and the second set of instances corrected with a third correction factor, and wherein the second corrected instance set size is a size of the second set of instances deducted by a number of instances only found by the second text processing service multiplied with a third weakening factor.
地址 Walldorf DE