发明名称 METHOD AND SYSTEM FOR REMOVING CHROME FROM A WEB PAGE
摘要 A method and system for removing chrome from a web page is provided. An example system includes a parsing module, a text density analyzer, a content node selector 206, and a text extractor. The parsing module may be configured to parse a web page into a tree structure. The text density analyzer may be configured to determine a text density score value for each node from the tree structure. The content node selector may be configured to identify one or more nodes from the tree structure as content nodes based on their respective text density score values. The text extractor may be configured to extract text from the content nodes only.
申请公布号 US2011258528(A1) 申请公布日期 2011.10.20
申请号 US20100761272 申请日期 2010.04.15
申请人 ROPER JOHN;GLASGOW DANE 发明人 ROPER JOHN;GLASGOW DANE
分类号 G06F17/00 主分类号 G06F17/00
代理机构 代理人
主权项
地址