发明名称 Information search method, apparatus, program and computer readable recording medium
摘要 An information search apparatus is provided. The information search apparatus includes: a character string input unit configured to obtain a character string from a client; a character string information search unit configured to obtain information that includes the character string from an index DB; a similarity calculation unit configured to calculate degree of similarity between the character string and searched information; and an output unit configured to output the searched information in descending order of the degree of similarity. In the information search apparatus, the character string information search unit includes a unit configured to, when the input character string contains a plurality of words, search an index DB, based on each word, that stores words and occurrence position information of the words to obtain a distance between occurrence positions of the words, and the similarity calculation unit includes a unit configured to calculate the degree of similarity based on the distance between occurrence positions of the words.
申请公布号 US8909654(B2) 申请公布日期 2014.12.09
申请号 US200812742442 申请日期 2008.09.10
申请人 Nippon Telegraph and Telephone Corporation 发明人 Uematsu Yukio;Fujioka Kengo;Konagai Syunsuke;Kataoka Ryoji
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Oblon, Spivak, McClelland, Maier & Neustadt, L.L.P. 代理人 Oblon, Spivak, McClelland, Maier & Neustadt, L.L.P.
主权项 1. An information search apparatus comprising: a processor; a memory that stores an index database (DB); a character string input unit configured to obtain a character string from a client; a character string information search unit configured to obtain information that includes the character string from the index DB; a similarity calculation unit, implemented by the processor, configured to calculate degree of similarity between the character string and searched information; and an output unit configured to output the searched information in descending order of the degree of similarity, wherein the index DB stores each word with sentence-based occurrence position information, of each document where the word occurs, that indicates each position of sentences where the word occurs, and when the input character string contains a plurality of words, the character string information search unit searches the index DB, based on each word, to obtain a document d including each word and occurrence positions of each word in the document d, and the similarity calculation unit calculates a degree of agreement score(Q,d,k) between occurrence positions of the words byscore(Q,d,k)⁢∑qi∈Q⁢⁢∑qj∈(Q-qi)⁢⁢1α⁢⁢k+1⁢count⁡(Posd⁡(qi),Posdk⁡(qj))so as to calculate the degree of similarity based on the degree of agreement score (Q,d,k), wherein Q indicates a set of words obtained by dividing the character string, Posd (qi) indicates an occurrence position of a word qi in the document d, Posdk(qi) indicates a value obtained by subtracting k from an occurrence position of a word qi in the document d, k indicates a counter value, α indicates a coefficient, and Count (Pos,Pos) indicates a function for receiving two pieces of position data and returning a degree of agreement.
地址 Tokyo JP