发明名称 Method for content mining of semi-structured documents
摘要 Embodiments of the present invention are directed to a method for content mining of semi-structured documents. In one embodiment, a semi-structured document is first converted from a document-type specific format such as HTML or PDF, to a document-type independent format such as XML. The document formatting, which contains basic level information about the document's structure, is then analyzed by a series of modules to develop a higher level understanding of the document's structure. These modules append information to the document describing the features which collectively comprise the higher level document structure. The appended information facilitates finding specified information within the document when content mining is performed.
申请公布号 US2003140311(A1) 申请公布日期 2003.07.24
申请号 US20020053987 申请日期 2002.01.18
申请人 LEMON MICHAEL J.;CASTELLANOS MARIA;STINGER JAMES R. 发明人 LEMON MICHAEL J.;CASTELLANOS MARIA;STINGER JAMES R.
分类号 G06F15/16;(IPC1-7):G06F15/16 主分类号 G06F15/16
代理机构 代理人
主权项
地址