摘要 |
<P>PROBLEM TO BE SOLVED: To automatically extract a pair of character strings having an appropriate semantic relation from document data. <P>SOLUTION: A language analysis means 22 divides document data to a plurality of continuous segment strings. A segment area discrimination means 23 discriminates, using the divided segment strings, a continuous segment area containing a pair of segments surrounding a first segment string of a predetermined condition and a second segment string composed of one or more segments of a predetermined condition. Concretely, the predetermined condition related to the first segment string is set so that all the segments constituting the first segment string have independent words having an indeclinable word or a word class corresponding thereto, and the condition related to the pair of segments surrounding the second segment string is set so that independent words of two segments constituting the pair of segments are symbolic characters having a paired relation. An inter-segment string relation identification means 24 identifies the relation between the discriminated first segment string and second segment string. <P>COPYRIGHT: (C)2006,JPO&NCIPI |