INFORMATION EXTRACTION METHOD, INFORMATION EXTRACTION DEVICE, AND INFORMATION EXTRACTION PROGRAM,申请号JP20110166460-传众专利搜索

发明名称	INFORMATION EXTRACTION METHOD, INFORMATION EXTRACTION DEVICE, AND INFORMATION EXTRACTION PROGRAM
摘要	<P>PROBLEM TO BE SOLVED: To extract the text from a structured document without depending upon a rule for text extraction. <P>SOLUTION: A document set recording part 2 records an HTML file of a document to be processed in a document set DB 3. A link source information extraction part 4 extracts a hyperlink embedded in an HTML file, acquired from the document set DB 3, and link peripheral text information. A text extraction part 5 specifies a hyperlink referring to the HTML file acquired from the document set DB 3 as an HTML file of a link destination document on condition that the hyperlink is extracted by the link source information extraction part 4. The text extraction part 5 compares a character string of text information present in the HTML file of the specified link destination document with a character string of the text information, and extracts a representative part in the link destination document as the body. An output part 6 outputs the extracted text. <P>COPYRIGHT: (C)2013,JPO&INPIT
申请公布号	JP2013030041(A)	申请公布日期	2013.02.07
申请号	JP20110166460	申请日期	2011.07.29
申请人	NIPPON TELEGR & TELEPH CORP	发明人
分类号	G06F17/30;G06F13/00	主分类号	G06F17/30
代理机构		代理人
主权项
地址