发明名称 Automatic crowd sourcing for machine learning in information extraction
摘要 <p>The present document relates to the field of machine learning and information extraction. In particular, the present document relates to a method and system for machine learning and information extraction using data to which the system does not have direct access to, in order to extract named entities or other information from that data in a flexible and adaptive way. A method for enabling machine learning from unstructured documents is described. The method comprises analyzing, at an electronic device (401), one or more structured databases, thereby providing a mapping (101) between a plurality of referenced character strings and a corresponding plurality of type labels; providing (102), at the electronic device (401), a first unstructured document comprising a plurality of unstructured character strings; analyzing the first unstructured document to identify (103) a first character string of the plurality of unstructured character strings which is associated with a first referenced character string of the plurality of referenced character strings; associating (104), within the first unstructured document, a first type label which is mapped to the first referenced character string to the first character string; and determining a training set for machine learning from the first unstructured document comprising the association to the first type label.</p>
申请公布号 EP2570974(A1) 申请公布日期 2013.03.20
申请号 EP20110181107 申请日期 2011.09.13
申请人 EXB ASSET MANAGEMENT GMBH 发明人 ASSADOLLAHI, RAMIN;BORDAG, STEFAN
分类号 G06N99/00;G06N5/00;H04L12/58 主分类号 G06N99/00
代理机构 代理人
主权项
地址