摘要 |
PROBLEM TO BE SOLVED: To extract necessary attributes in a structured document by simple specification without being conscious of differences in the various expressions of the structured document. SOLUTION: An attribute extraction part 1c reads out a structured document 1a, collates the read document 1a in accordance with a set of attribute schemas having character string patterns and attribute names defined in a schema definition part 1b, extracts an element and a text of a suited structured document 1a are extracted as attribute names, and when a character string pattern is suited to the element of the document 1a, the contents of the suited element are extracted as an attribute value. When the character pattern is suited to the text of the document 1a, an ancestor element to be the ancestor of a 1st suited text and at the same time an ancestor of texts other than the 1st text is specified, an element other than the 1st element to which the 1st text directly belongs out of elements having the ancestor element as an ancestor and a text other than the 1st text out of texts having the ancestor element as an ancestor are extracted as attribute values to generate an attribute list 1d. |