发明名称 |
Method and system for extracting shadow entities from emails |
摘要 |
One embodiment provides a system for extracting shadow entities from emails. During operation, the system receives a number of document corpora. The system then calculates word-collocation statistics associated with different n-gram sizes for the document corpora. Next, the system receives an email and identifies shadow entities in the email based on the calculated word-collocation statistics for the document corpora. |
申请公布号 |
US8983826(B2) |
申请公布日期 |
2015.03.17 |
申请号 |
US201113173698 |
申请日期 |
2011.06.30 |
申请人 |
Palo Alto Research Center Incorporated |
发明人 |
Brdiczka Oliver;Hizalev Petro |
分类号 |
G06F17/27 |
主分类号 |
G06F17/27 |
代理机构 |
Park, Vaughan, Fleming & Dowler LLP |
代理人 |
Yao Shun;Park, Vaughan, Fleming & Dowler LLP |
主权项 |
1. A computer-executable method for extracting shadow entities from emails, the method comprising:
loading, by a computing device, a shadow-entity extraction application from storage into memory; executing, by the computing device, the shadow-entity extraction application to perform: receiving a number of document corpora; calculating word-collocation statistics associated with different n-gram sizes for the document corpora; receiving an email; identifying noun phrases in the email; filtering the noun phrases in the email based on the calculated word-collocation statistics of different n-gram sizes for the document corpora, wherein filtering the noun phrases comprises determining a filtering threshold of mutual dependency (MD), point-wise mutual information (PMI), or log-frequency biased mutual dependency (LFMD) value based on an average MD, PMI, or LFMD value of the noun phrases in the email against the MD, PMI, or LFMD value for the document corpora; and identifying shadow entities in the email, based on the calculated word-collocation statistics for the document corpora and the filtered noun phrases in the email. |
地址 |
Palo Alto CA US |