发明名称 System and method for authorship disambiguation and alias resolution in electronic data
摘要 A system, method and computer program product for authorship determination, and alias resolution, including a document collection; a Jaro-Winkler similarity module configured for performing authorship determination and alias resolution based on at least one of email addresses, user identification numbers (IDs) on social networks, names written in text, and proper names, including countries and cities in the document collection; an authorship Support Vector Machine (SVM) module configured for performing authorship determination and alias resolution based on content of documents in the document collection, including at least one of emails, and social networks information; and a Jaccard similarity module configured for performing authorship determination and alias resolution based on link networks in the document collection. One or more of the Jaro-Winkler similarity module, the authorship Support Vector Machine (SVM) module, and the Jaccard similarity module are employed for performing the authorship determination and the alias resolution in the document collection.
申请公布号 US9264387(B2) 申请公布日期 2016.02.16
申请号 US201313760341 申请日期 2013.02.06
申请人 MSC INTELLECTUAL PROPERTIES B.V. 发明人 Scholtes Johannes Cornelis;Maes Freek Peter Elisabeth
分类号 H04L12/58 主分类号 H04L12/58
代理机构 The Villamar Firm PLLC 代理人 Villamar Carlos R.;The Villamar Firm PLLC
主权项 1. A computer implemented system for authorship determination, and alias resolution, the system comprising: a processor and a memory executing: a document collection; a Jaro-Winkler similarity module configured for performing authorship determination and alias resolution based on at least one of email addresses, user identification numbers (IDs) on social networks, names written in text, and proper names, including countries and cities in the document collection; an authorship Support Vector Machine (SVM) module configured for performing authorship determination and alias resolution based on content of documents in the document collection, including at least one of emails, and social networks information; a Jaccard similarity module configured for performing authorship determination and alias resolution based on link networks in the document collection, wherein the system is configured to employ the Jaro-Winkler similarity module, the authorship Support Vector Machine (SVM) module, and the Jaccard similarity module for performing the authorship determination and the alias resolution in the document collection; and a voting Support Vector Machine (SVM) module configured to combine outputs from the Jaro-Winkler similarity module, the authorship Support Vector Machine (SVM) module, and the Jaccard similarity module using a voting algorithm, wherein the system is configured to extract syntactic, structural, and semantic features for the authorship determination and the alias resolution in the document collection.
地址 Amsterdam NL