发明名称 Document retrieval using index of reduced size
摘要 A document retrieval apparatus for retrieving a document including a query character string among a plurality of registered documents includes a text separating unit which separates the registered documents and a query character string into n-grams and words, an n-gram index which stores therein information about occurrences of n-grams appearing in the registered documents on a n-gram-specific basis, a word-boundary-position index which stores therein information about occurrences of word boundaries appearing in the registered documents in a compressed form, a character-string-based search unit which identifies one or more registered documents including the query character string by looking up one or more n-grams of the query character string in the n-gram index, and a word-based search unit which checks whether the query character string appears as word in the one or more identified registered documents by looking up one or more words of the query character string in the word-boundary-position index, thereby identifying a registered document including the query character string as word.
申请公布号 US7072889(B2) 申请公布日期 2006.07.04
申请号 US20020207816 申请日期 2002.07.31
申请人 RICOH COMPANY, LTD. 发明人 OGAWA YASUSHI
分类号 G06F7/00;G06F17/30 主分类号 G06F7/00
代理机构 代理人
主权项
地址