发明名称 DATA DETECTION METHOD, DATA DETECTION DEVICE, AND PROGRAM
摘要 The present invention enables designated data to be extracted from a structured document even when the structured document differs from others in terms of screen layout and document structure. A first structured document is read in and outputted to an output device; a first label to be extracted and first data to be extracted are acquired via an input device; an extraction pattern representing a relative relation in document structure between the first label to be extracted and the first data to be extracted is generated; and the extraction pattern is stored in a storage device. A second structured document is read in; a second label to be extracted is acquired; an extraction rule for extracting, from the second structured document and on the basis of the extraction pattern stored in the storage device and the second label to be extracted, second data to be extracted corresponding to the second label to be extracted is generated; and the second data to be extracted is extracted from the second structured document on the basis of the extraction rule.
申请公布号 US2016188744(A1) 申请公布日期 2016.06.30
申请号 US201314891842 申请日期 2013.05.17
申请人 HITACHI, LTD. 发明人 ITO Hideaki;DANNO Hirofumi;SASHINO Atsushi;HARAGUCHI Takuya
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A data extraction method in a data extraction device extracting data from a structured document, comprising: reading in a first structured document to output to an output device; acquiring a first label to be extracted and first data to be extracted via an input device; generating an extraction pattern representing a relative relationship in terms of document structure between the first label to be extracted and the first data to be extracted; storing the extraction pattern in a memory device; reading in a second structured document; acquiring a second label to be extracted; generating, on the basis of the extraction pattern stored in the memory device and the second label to be extracted, an extraction rule for extracting from the second structured document second data to be extracted corresponding to the second label to be extracted; and extracting on the basis of the extraction rule the second data to be extracted from the second structured document.
地址 Tokyo JP