发明名称 |
Citation record extraction system and method |
摘要 |
A citation record extraction system is provided for extracting citation records from publication list pages having different layouts and contents. An HTML rendering engine receives a publication list web page, parses the publication list web page to obtain layout information of the web page. A web page sequence builder generates a web page characteristic sequence for the web page according to the layout information. A web page repeated pattern analyzer analyzes repeated patterns presented in the web page characteristic sequence, screens out non-citation records therefrom, and obtains a citation record of the publication list web page. |
申请公布号 |
US8429520(B2) |
申请公布日期 |
2013.04.23 |
申请号 |
US20100834757 |
申请日期 |
2010.07.12 |
申请人 |
LEE HAHN-MING;HO JAN-MING;CHEN SHUI-SHI;YANG KAI-HSIANG;WANG RUEI-YUAN;YEH JEROME;NATIONAL TAIWAN UNIVERSITY OF SCIENCE AND TECHNOLOGY |
发明人 |
LEE HAHN-MING;HO JAN-MING;CHEN SHUI-SHI;YANG KAI-HSIANG;WANG RUEI-YUAN;YEH JEROME |
分类号 |
G06F17/20;G06F17/21;G06F17/22;G06F17/24 |
主分类号 |
G06F17/20 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|