发明名称 TEXT PROCESSING METHOD, SYSTEM AND COMPUTER PROGRAM
摘要 A method includes hierarchically identifying occurrences of some of the words in the set of sentences; creating a first index for each of some of the words based on the upper hierarchy of occurrences identified for each word; receiving input of a queried word; hierarchically identifying occurrences of the queried word in the set of sentences; creating a second index based on the upper hierarchy of occurrences identified for the queried word; comparing the first index and the second index to calculate an estimated value for the number of occurrences of a word in the neighborhood of the queried word; and calculating the actual value of the number of occurrences of a word in the neighborhood of the queried word based on an upper hierarchy and lower hierarchy of the occurrences on condition that the estimated value is equal to or greater than a predetermined number.
申请公布号 US2016357852(A1) 申请公布日期 2016.12.08
申请号 US201615243299 申请日期 2016.08.22
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 Takuma Daisuke;Yanagisawa Hiroki
分类号 G06F17/30;G06F17/27 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method of processing by computer a set of a plurality of sentences including a plurality of words, the method comprising the steps of: creating a first index for each of at least some of the words based on an upper hierarchy of occurrences identified for each word, wherein occurrences of at least some of the words in the set of sentences are hierarchically identified; creating a second index based on an upper hierarchy of occurrences identified for a queried word, wherein occurrences of the queried word in the set of sentences are hierarchically identified; comparing the first index and the second index to calculate an estimated value for the number of occurrences of a word in a neighborhood of the queried word; and calculating the actual value of the number of occurrences of a word in the neighborhood of the queried word based on an upper hierarchy and lower hierarchy of the occurrences on condition that the estimated value is equal to or greater than a predetermined number, wherein the first index and the second index have an upper hierarchy bit set compressed by 1/N where N is a natural number, and a compressed bit is true on condition that one or more uncompressed bits is true.
地址 Armonk NY US