发明名称 SYSTEM FOR AUTOMATICALLY EXTRACTING BY-LINE INFORMATION
摘要 A by-line extraction system detects a set of potential headlines from a title meta-tag of a crawled document, selects a candidate headline from the set of potential headlines, and extracts the by-line information from the document using the location of the selected candidate headline. The system constructs the set of potential headlines based on the title meta-tag. The system selects a candidate headline by evaluating the set of potential headlines in order of the lengths of the potential headlines. The system extracts the by-line information from the document by using the location of the selected candidate headline to extract a string representing a date, a name, or a source located within a minimum distance from the location of the potential headline.
申请公布号 US2008306941(A1) 申请公布日期 2008.12.11
申请号 US20080192917 申请日期 2008.08.15
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 DILL STEPHEN;KORUPOLU MADHUKAR R.;TOMKINS ANDREW S.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址