发明名称 |
Document search method wherein stored documents and search queries comprise segmented text data of spaced, nonconsecutive text elements and words segmented by predetermined symbols |
摘要 |
A neighboring plural-character occurrence bitmap of a practical capacity capable of eliminating noises by hashing is realized, and a high speed full text search is realized equivalently, by greatly reducing the number of documents to be searched even if a search term constituted by a combination of English characters and words is used. Text data is segmented into words, and n-character strings at every (m+l)-th character positions are extracted from each word. A neighboring plural-character occurrence bitmap is created which stores data representing a presence of each neighboring plural-character string at a certain entry thereof. N-character strings at every (m+l)-th character positions are extracted from a search term and the neighboring plural-character occurrence bitmap is searched by using a search control program. Since the neighboring plural-character occurrence bitmap is searched prior to searching condensed texts, documents not relevant to the search term can be discarded and a high speed full text search can be realized.
|
申请公布号 |
US5748953(A) |
申请公布日期 |
1998.05.05 |
申请号 |
US19950444842 |
申请日期 |
1995.05.18 |
申请人 |
HITACHI, LTD. |
发明人 |
MIZUTANI, NATSUKO;HATAKEYAMA, ATSUSHI;KAWAGUCHI, HISAMITSU;TADA, KATSUMI;KATO, KANJI;ASAKAWA, SATOSHI |
分类号 |
G06F17/30;G06K9/62;G06K9/72;(IPC1-7):G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|