发明名称 |
Extraction of information from structured documents |
摘要 |
A method of extracting information from a structured document includes the steps of assigning a partial tree identifier inclusive of a tag identifier to a selected partial tree wherein the tag identifier includes a name of a tag corresponding to a root of the selected partial tree, a name of at least one format attribute of the tag, and a value of the at least one format attribute, arranging names of format attributes in a predetermined order in the tag identifier if the at least one format attribute of the tag includes two or more format attributes, and identifying a partial tree having a partial tree identifier identical to the partial tree identifier of the selected partial tree from a list of partial tree identifiers of partial trees that exist in the structured document after updating thereof.
|
申请公布号 |
US2004044963(A1) |
申请公布日期 |
2004.03.04 |
申请号 |
US20030463521 |
申请日期 |
2003.06.18 |
申请人 |
NIPPON TELEGRAPH AND TELEPHONE CORPORATION |
发明人 |
UCHIYAMA TADASU;MIYAMOTO MASARU |
分类号 |
G06F17/21;G06F17/22;G06F17/30;(IPC1-7):G06F17/00 |
主分类号 |
G06F17/21 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|