发明名称 Citation record extraction system and method
摘要 A citation record extraction system is provided for extracting citation records from publication list pages having different layouts and contents. An HTML rendering engine receives a publication list web page, parses the publication list web page to obtain layout information of the web page. A web page sequence builder generates a web page characteristic sequence for the web page according to the layout information. A web page repeated pattern analyzer analyzes repeated patterns presented in the web page characteristic sequence, screens out non-citation records therefrom, and obtains a citation record of the publication list web page.
申请公布号 US8429520(B2) 申请公布日期 2013.04.23
申请号 US20100834757 申请日期 2010.07.12
申请人 LEE HAHN-MING;HO JAN-MING;CHEN SHUI-SHI;YANG KAI-HSIANG;WANG RUEI-YUAN;YEH JEROME;NATIONAL TAIWAN UNIVERSITY OF SCIENCE AND TECHNOLOGY 发明人 LEE HAHN-MING;HO JAN-MING;CHEN SHUI-SHI;YANG KAI-HSIANG;WANG RUEI-YUAN;YEH JEROME
分类号 G06F17/20;G06F17/21;G06F17/22;G06F17/24 主分类号 G06F17/20
代理机构 代理人
主权项
地址