发明名称 METHOD AND SYSTEM RELATING TO SALIENT CONTENT EXTRACTION FOR ELECTRONIC CONTENT
摘要 <p>Automatic approaches to scraping salient content from sources of content are provided that allow the salient content to be provided to the user or subjected for further processing such as clustering or sentiment analysis. Embodiments of the invention provide for: automated scraper induction based on document and/or contextual semantic cues and document structure analysis; identifying salient text, removing boiler-plate text, off-topic content and other non-salient content; deriving reusable descriptive extraction patterns for subsequent documents; applying descriptive extraction patterns for extraction from subsequent documents form the same source; intelligent identification of extraction success confidence score, using historical success scores; and employing confidence scores to automatically trigger new extraction pattern identification if extracted confidence is below an acceptable confidence threshold.</p>
申请公布号 WO2013170343(A1) 申请公布日期 2013.11.21
申请号 WO2013CA00075 申请日期 2013.01.30
申请人 WHYZ TECHNOLOGIES LIMITED 发明人 KHAN, SHAHZAD
分类号 G06F17/27;G06F17/00;G06F17/30 主分类号 G06F17/27
代理机构 代理人
主权项
地址