发明名称 Systems and methods for information extraction
摘要 Methods and systems for information extraction are disclosed. In one such method and system, a sample of related articles is obtained, and an article is selected as a seed article. The distances between sample articles are calculated to determine a set of one or more closest articles to the seed article. The set of closest articles is used to identify information fields containing variable data within the seed article. There are a variety of techniques by which this may be performed, one of which is by using dynamic programming alignment to compute alignments between articles. The information fields are labeled, and a template is generated using the labeled fields. The template is used to extract data from a source article by comparing the source article with the template and associating the variable data of the source article with the labeled fields.
申请公布号 US7836012(B1) 申请公布日期 2010.11.16
申请号 US20070697677 申请日期 2007.04.06
申请人 GOOGLE INC. 发明人 NEVILL-MANNING CRAIG;WITTEN IAN
分类号 G06F17/00;G06F17/30 主分类号 G06F17/00
代理机构 代理人
主权项
地址