摘要 |
<p><P>PROBLEM TO BE SOLVED: To properly and easily extract a character string from an HTML (Hyper Text Markup Language) document by use of a tag that rarely appears in the HTML document. <P>SOLUTION: A tag and a comment tag closest to a start point and located before the start point are extracted from the HTML document containing the start point and an end point of the character string to be extracted, as a start point tag, and a tag and a comment tag closest to the end point and located after the end point are extracted from a plurality of tags in the HTML document as an end point tag. The start point tag 101 and the end point tag 102 are retrieved from the HTML document 103. The character string between the retrieved start point tag 101 and the end point tag 102 is extracted. <P>COPYRIGHT: (C)2011,JPO&INPIT</p> |