发明名称 Apparatus and Method for Conditioning Semi-Structured Text for use as a Structured Data Source
摘要 In one embodiment, the present invention includes a method for conditioning semi-structured text to enhance its use as a data source for an analytical processing tool. In general, the method involves analyzing the semi-structured text to identify portions of text (referred to herein as sub-documents) that exhibit a repetitive characteristic. Next, for each sub-document identified, the semi-structured text is integrated, for example, by filtering the text for relevant words, removing stop words, stemming certain words, adding or replacing certain words with synonyms, modifying the spelling of certain words, and/or resolving certain homonyms based on a document class assigned to the semi-structured text, and so on. Once integrated, the sub-documents are mapped to existing structures defined for the document class and/or sub-document type. Finally, the mapped textual elements are used to generate an index, or alternatively, the textual elements are inserted directly into a structured data repository, such as a database.
申请公布号 US2009259670(A1) 申请公布日期 2009.10.15
申请号 US20080102577 申请日期 2008.04.14
申请人 INMON WILLIAM H 发明人 INMON WILLIAM H.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址