主权项 |
1. A computer-implemented method of merging text analysis results, comprising:
processing, by a computer system, a plurality of text information according to a first text processing service to generate a first plurality of instances annotated according to a first taxonomy having a first set of elements; processing, by the computer system, the plurality of text information according to a second text processing service to generate a second plurality of instances annotated according to a second taxonomy having a second set of elements; calculating, by the computer system, a first coefficient between a first set of instances and a second set of instances according to a first corrected, weakened Jaccard factor, wherein the first set of instances corresponds to a first element of the first set of elements and wherein the second set of instances corresponds to a second element of the second set of elements; calculating, by the computer system, a second coefficient between the first set of instances and the second set of instances according to a second corrected, weakened Jaccard factor; calculating, by the computer system, a third coefficient between the first set of instances and the second set of instances according to a third corrected, weakened Jaccard factor; determining, by the computer system, that the first element is a subtype of the second element, that the second element is a subtype of the first element, or that the first element is associated with the second element, according to the first coefficient, the second coefficient and the third coefficient; and merging, by the computer system, the first taxonomy and the second taxonomy according to the first element and the second element being associated, the first element being the subtype of the second element, or the second element being the subtype of the first element, wherein the first corrected, weakened Jaccard factor corresponds to a Jaccard factor having a numerator and a denominator, wherein the Jaccard factor is corrected in the numerator with a first correction factor and weakened in the denominator with a first weakening factor, wherein the second corrected, weakened Jaccard factor corresponds to a ratio between a first corrected intersection size and a first corrected instance set size, wherein the first corrected intersection size is a size of an intersection of the first set of instances and the second set of instances corrected with a second correction factor, and wherein the first corrected instance set size is a size of the first set of instances deducted by a number of instances only found by the first text processing service multiplied with a second weakening factor, and wherein the third corrected, weakened Jaccard factor corresponds to a ratio between a second corrected intersection size and a second corrected instance set size, wherein the second corrected intersection size is a size of an intersection of the first set of instances and the second set of instances corrected with a third correction factor, and wherein the second corrected instance set size is a size of the second set of instances deducted by a number of instances only found by the second text processing service multiplied with a third weakening factor. |