摘要 |
A method for detecting the similarity of the patent documents based on a new kernel function Luke kernel comprises: dividing a patent document into five elements, i.e. patent name, abstract, claims, description and main classification; constructing a new kernel function Luke kernel, calculating the similarity of the first four elements of two patent documents respectively by using the Luke kernel, calculating the similarity between the main classifications of the two patent documents by means of character string matching, and then performing a weighted summation of the similarity of the five elements of the two patent documents to obtain an overall similarity of the patent documents. The method further improves the accuracy and recall rate in the similarity of the patent documents detection, and can be applied to the similarity of the patent documents detection. |