主权项 |
1. A machine-implemented method to execute on a processor, comprising:
extracting, by the processor, claims from a patent document; normalizing, by the processor, the claims into an extracted normalized claim format and retaining structure defined in the claims within the extracted normalized claim format to identify particular limitations of the claims and retaining within the extracted normalized format a frequency of occurrence of each particular word identified in the claims used to identify rankings for each particular word occurring within the claims, and wherein noise words are removed from the normalized format, the noise words provided by a user as a set of known noise words based on a technology associated with the claims, and thesauri words are added to the normalized format, and wherein normalizing further includes translating the claims to a target spoken language, and wherein normalizing further includes stemming remaining words to their morphological roots after the noise words are removed, and wherein normalizing further includes associating synonyms with the stemmed and remaining words and acquiring the synonyms from a domain specific lexicon relevant to the patent document and the synonyms represented in morphological formats; producing, by the processor, an abstract representing the normalized claim format, the abstract is a combination of particular terms and particular metadata; receiving, by the processor, a user-defined ranking selected by the user from a range of 0-10; comparing, by the processor, the abstract against a repository of additional abstracts, via a score assigned to the abstract and additional scores assigned to the additional abstracts and evaluating a result against a predefined range to determine related ones of the additional abstracts to the abstract based on the user-defined ranking, and wherein a degree to which concepts are expanded within the abstract is determined by the user-defined ranking allowing the user to control a precision and a recall associated with expanding the concepts; and returning, by the processor, the related ones of the additional abstracts. |