摘要 |
FIELD: information technology.SUBSTANCE: personal data identification is achieved through linguistic techniques, realised by a data collection server, a linguistic processing server and an application server. The disclosed method includes creating a task based on open source bypass parameters coming in through an administrator's automated workstation. Further, the method includes loading text, bypassing open sources and loading texts or transmitting texts from an external system; selecting links from the loaded texts for addition thereof to addresses for further bypass; extracting text and converting binary files to a text format; text prepared for analysis is broken down and the substance is determined; the substance of personal data in the text is selected; personal data are identified; facts (substance determined at the previous step associated with persons) of personal data in the text are identified.EFFECT: providing high relevance of results when identifying personal data in open information sources and in text files of the most common formats.7 cl, 3 dwg |