发明名称 |
Information search method, apparatus, program and computer readable recording medium |
摘要 |
An information search apparatus is provided. The information search apparatus includes: a character string input unit configured to obtain a character string from a client; a character string information search unit configured to obtain information that includes the character string from an index DB; a similarity calculation unit configured to calculate degree of similarity between the character string and searched information; and an output unit configured to output the searched information in descending order of the degree of similarity. In the information search apparatus, the character string information search unit includes a unit configured to, when the input character string contains a plurality of words, search an index DB, based on each word, that stores words and occurrence position information of the words to obtain a distance between occurrence positions of the words, and the similarity calculation unit includes a unit configured to calculate the degree of similarity based on the distance between occurrence positions of the words. |
申请公布号 |
US8909654(B2) |
申请公布日期 |
2014.12.09 |
申请号 |
US200812742442 |
申请日期 |
2008.09.10 |
申请人 |
Nippon Telegraph and Telephone Corporation |
发明人 |
Uematsu Yukio;Fujioka Kengo;Konagai Syunsuke;Kataoka Ryoji |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
Oblon, Spivak, McClelland, Maier & Neustadt, L.L.P. |
代理人 |
Oblon, Spivak, McClelland, Maier & Neustadt, L.L.P. |
主权项 |
1. An information search apparatus comprising:
a processor; a memory that stores an index database (DB); a character string input unit configured to obtain a character string from a client; a character string information search unit configured to obtain information that includes the character string from the index DB; a similarity calculation unit, implemented by the processor, configured to calculate degree of similarity between the character string and searched information; and an output unit configured to output the searched information in descending order of the degree of similarity, wherein the index DB stores each word with sentence-based occurrence position information, of each document where the word occurs, that indicates each position of sentences where the word occurs, and when the input character string contains a plurality of words, the character string information search unit searches the index DB, based on each word, to obtain a document d including each word and occurrence positions of each word in the document d, and the similarity calculation unit calculates a degree of agreement score(Q,d,k) between occurrence positions of the words byscore(Q,d,k)∑qi∈Q∑qj∈(Q-qi)1αk+1count(Posd(qi),Posdk(qj))so as to calculate the degree of similarity based on the degree of agreement score (Q,d,k),
wherein Q indicates a set of words obtained by dividing the character string, Posd (qi) indicates an occurrence position of a word qi in the document d, Posdk(qi) indicates a value obtained by subtracting k from an occurrence position of a word qi in the document d, k indicates a counter value, α indicates a coefficient, and Count (Pos,Pos) indicates a function for receiving two pieces of position data and returning a degree of agreement. |
地址 |
Tokyo JP |