发明名称 ATTRIBUTE EXTRACTION DEVICE
摘要 PROBLEM TO BE SOLVED: To extract necessary attributes in a structured document by simple specification without being conscious of differences in the various expressions of the structured document. SOLUTION: An attribute extraction part 1c reads out a structured document 1a, collates the read document 1a in accordance with a set of attribute schemas having character string patterns and attribute names defined in a schema definition part 1b, extracts an element and a text of a suited structured document 1a are extracted as attribute names, and when a character string pattern is suited to the element of the document 1a, the contents of the suited element are extracted as an attribute value. When the character pattern is suited to the text of the document 1a, an ancestor element to be the ancestor of a 1st suited text and at the same time an ancestor of texts other than the 1st text is specified, an element other than the 1st element to which the 1st text directly belongs out of elements having the ancestor element as an ancestor and a text other than the 1st text out of texts having the ancestor element as an ancestor are extracted as attribute values to generate an attribute list 1d.
申请公布号 JP2000259660(A) 申请公布日期 2000.09.22
申请号 JP19990064504 申请日期 1999.03.11
申请人 FUJI XEROX CO LTD 发明人 NUMATA KENICHI
分类号 G06F17/21;G06F17/27;G06F17/30 主分类号 G06F17/21
代理机构 代理人
主权项
地址