摘要 |
<p>The present document relates to the field of machine learning and information extraction. In particular, the present document relates to a method and system for machine learning and information extraction using data to which the system does not have direct access to, in order to extract named entities or other information from that data in a flexible and adaptive way. A method for enabling machine learning from unstructured documents is described. The method comprises analyzing, at an electronic device (401), one or more structured databases, thereby providing a mapping (101) between a plurality of referenced character strings and a corresponding plurality of type labels; providing (102), at the electronic device (401), a first unstructured document comprising a plurality of unstructured character strings; analyzing the first unstructured document to identify (103) a first character string of the plurality of unstructured character strings which is associated with a first referenced character string of the plurality of referenced character strings; associating (104), within the first unstructured document, a first type label which is mapped to the first referenced character string to the first character string; and determining a training set for machine learning from the first unstructured document comprising the association to the first type label.</p> |