发明名称 Systems and methods of semantically annotating documents of different structures
摘要 A computer retrieves a document from a data source, wherein the document has a structure type. The computer generates a customized data model for the document in accordance with its structure type. The computer identifies one or more candidate chunks within the customized data model in accordance with a set of heuristic rules associated with the structure type.
申请公布号 US8924374(B2) 申请公布日期 2014.12.30
申请号 US200812035597 申请日期 2008.02.22
申请人 Tigerlogic Corporation 发明人 Dexter Jeffrey Matthew
分类号 G06F7/00;G06F17/30;G06F17/24 主分类号 G06F7/00
代理机构 Morgan, Lewis & Bockius LLP 代理人 Morgan, Lewis & Bockius LLP
主权项 1. A computer-implemented method, comprising: at a computer having memory and one or more processors: receiving one or more search keywords from a user;selecting a plurality of candidate document identifiers in accordance with the one or more search keywords, each candidate document identifier corresponding to a respective document at a respective data source;for a respective candidate document identifier of the plurality of candidate document identifiers: retrieving a document corresponding to the respective candidate document identifier from a data source, wherein the document has a structure type;converting the document into a node stream, wherein the document conversion is initiated immediately after retrieving a portion of the document;generating a customized data model for the document using the node stream in accordance with the structure type of the document;identifying one or more candidate chunks within the customized data model in accordance with a set of heuristic rules associated with the structure type; andselecting one or more chunks of the candidate chunks that satisfy the one or more search keywords; andproviding at least one of the selected one or more chunks for display to the user.
地址 Irvine CA US