发明名称 Document search method wherein stored documents and search queries comprise segmented text data of spaced, nonconsecutive text elements and words segmented by predetermined symbols
摘要 A neighboring plural-character occurrence bitmap of a practical capacity capable of eliminating noises by hashing is realized, and a high speed full text search is realized equivalently, by greatly reducing the number of documents to be searched even if a search term constituted by a combination of English characters and words is used. Text data is segmented into words, and n-character strings at every (m+l)-th character positions are extracted from each word. A neighboring plural-character occurrence bitmap is created which stores data representing a presence of each neighboring plural-character string at a certain entry thereof. N-character strings at every (m+l)-th character positions are extracted from a search term and the neighboring plural-character occurrence bitmap is searched by using a search control program. Since the neighboring plural-character occurrence bitmap is searched prior to searching condensed texts, documents not relevant to the search term can be discarded and a high speed full text search can be realized.
申请公布号 US5748953(A) 申请公布日期 1998.05.05
申请号 US19950444842 申请日期 1995.05.18
申请人 HITACHI, LTD. 发明人 MIZUTANI, NATSUKO;HATAKEYAMA, ATSUSHI;KAWAGUCHI, HISAMITSU;TADA, KATSUMI;KATO, KANJI;ASAKAWA, SATOSHI
分类号 G06F17/30;G06K9/62;G06K9/72;(IPC1-7):G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址