发明名称 EXTRACTING STRUCTURED DATA FROM WEB FORUMS
摘要 The web forum data extraction technique is designed for the structured data extraction of data on web forums using both page-level information and site-level knowledge. To do this, the technique finds the kinds of page objects a forum site has, which object a page belongs to, and how different page objects are connected with each other. This information can be obtained by re-constructing the sitemap of the target forum which is based on a Data Object Model of the target forum. The web forum data extraction technique collects three kinds of evidence for data extraction: 1) inner-page features which cover both semantic and layout information on an individual page; 2) inter-vertex features which describe linkage-related observations; and 3) inner-vertex features which characterize interrelationships among pages in one vertex. The technique employs Markov Logic Networks to combine the types of evidence statistically for inference and thereby can extract the desired structures.
申请公布号 US2010211533(A1) 申请公布日期 2010.08.19
申请号 US20090388517 申请日期 2009.02.18
申请人 MICROSOFT CORPORATION 发明人 YANG JIANGMING;CAI RUI;ZHANG LEI;MA WEI-YING
分类号 G06F15/18;G06N5/02 主分类号 G06F15/18
代理机构 代理人
主权项
地址