发明名称 Apparatus and method for storing, searching for and retrieving text of a structured document provided with tags
摘要 An apparatus and a method for efficiently searching through a tagged document for the location of a desired word in text using tags as reference units for search and retrieval, whereby any of the referenced words in the text is searched for and retrieved quickly. The apparatus comprises: a document inputting part for inputting a structured document including reference units delimited by tags; a dividing part for dividing into reference units the structured document input by the document inputting part; a word extracting part for extracting words from the reference units divided by the dividing part; a tuple generating part for generating tuples comprising the locations of the reference units divided by the dividing part and the words extracted by the word extracting part from the reference units; a search index generating part which, given the tuples generated by the tuple generating part out of the locations of the reference units and the words from the reference units, generates a search index comprising the words and the locations of the reference units including the words; and a storing part for storing the search index, generated by the search index generating part, in conjunction with the structured document input by the document inputting part.
申请公布号 US5778400(A) 申请公布日期 1998.07.07
申请号 US19960605795 申请日期 1996.02.22
申请人 FUJI XEROX CO., LTD. 发明人 TATENO, MASAKAZU
分类号 G06F17/22;G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F17/22
代理机构 代理人
主权项
地址