摘要 |
PROBLEM TO BE SOLVED: To realize highly reliable keyword extracting and document retrieval by extracting a keyword to which layout information is added. SOLUTION: This method comprises a step S201 of inputting a document image, a step S202 of extracting layout information from the document image, a step S203 of recognizing a character in a character area extracted by the step S202, and obtaining a character code string, and a step S204 of extracting a keyword from the character code string by language analysis, and weighting the keyword based on the plural layout information.
|