摘要 |
An information processing apparatus for classifying documents. The apparatus begins with an input document, and detects related documents. Next, the apparatus converts the document components into element identifying information that indicates the type or role of the document component. In the next stage, an internal document sequence converting section converts each set of repetitive element identifying information into element identifying information indicating the sequence of the element identifying information. Next, an inter-document sequence converting section identifies sets of element identifying information that appear in both a target document and a related document. An inter-document sequence converting section detects the identified sets of element identifying information which appear repeatedly in the target document. Then, a sequence converting identifies information into element identifying information, indicating the sequence of the elements, and structures the target document.
|