发明名称 |
Systems and methods of semantically annotating documents of different structures |
摘要 |
A computer retrieves a document from a data source, wherein the document has a structure type. The computer generates a customized data model for the document in accordance with its structure type. The computer identifies one or more candidate chunks within the customized data model in accordance with a set of heuristic rules associated with the structure type. |
申请公布号 |
US8924374(B2) |
申请公布日期 |
2014.12.30 |
申请号 |
US200812035597 |
申请日期 |
2008.02.22 |
申请人 |
Tigerlogic Corporation |
发明人 |
Dexter Jeffrey Matthew |
分类号 |
G06F7/00;G06F17/30;G06F17/24 |
主分类号 |
G06F7/00 |
代理机构 |
Morgan, Lewis & Bockius LLP |
代理人 |
Morgan, Lewis & Bockius LLP |
主权项 |
1. A computer-implemented method, comprising:
at a computer having memory and one or more processors:
receiving one or more search keywords from a user;selecting a plurality of candidate document identifiers in accordance with the one or more search keywords, each candidate document identifier corresponding to a respective document at a respective data source;for a respective candidate document identifier of the plurality of candidate document identifiers:
retrieving a document corresponding to the respective candidate document identifier from a data source, wherein the document has a structure type;converting the document into a node stream, wherein the document conversion is initiated immediately after retrieving a portion of the document;generating a customized data model for the document using the node stream in accordance with the structure type of the document;identifying one or more candidate chunks within the customized data model in accordance with a set of heuristic rules associated with the structure type; andselecting one or more chunks of the candidate chunks that satisfy the one or more search keywords; andproviding at least one of the selected one or more chunks for display to the user. |
地址 |
Irvine CA US |