发明名称 |
Personally identifiable information detection |
摘要 |
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for privacy protection. In one aspect, a method includes accessing personally identifiable information (PII) type definitions that characterize PII types; identifying PII type information included in content of a web page, the PII type information being information matching at least one PII type definition; identifying secondary information included in the content of the web page, the secondary information being information that is predefined as being associated with PII type information; determining a risk score from the PII type information and the secondary information; and classifying the web page as a personal information exposure risk if the risk score meets a confidentiality threshold, wherein the personal information exposure risk is indicative of the web page including personally identifiable information. |
申请公布号 |
US9015802(B1) |
申请公布日期 |
2015.04.21 |
申请号 |
US201314024943 |
申请日期 |
2013.09.12 |
申请人 |
Google Inc. |
发明人 |
Muthusrinivasan Muthuprasanna;Haahr Paul;Cutts Matthew D. |
分类号 |
G06F21/00;G06F21/62;H04L29/06 |
主分类号 |
G06F21/00 |
代理机构 |
Fish & Richardson P.C. |
代理人 |
Fish & Richardson P.C. |
主权项 |
1. A method performed by data processing apparatus, the method comprising:
accessing, by a data processing apparatus, personally identifiable information (PII) type definitions that characterize PII types; identifying, by the data processing apparatus, PII type information included in content of a web page, the PII type information being information matching at least one PII type definition; identifying a sup-portion of content of the web page, the sub-potion of content being content within a window that includes the PII type information and additional content and excluding other content of the web page; identifying, by the data processing apparatus, secondary information included in the sub-portion of content of the web page, the secondary information being content that matches information that is predefined as being associated with PII type information; determining a risk score from the PII type information and the secondary information; and classifying the web page as a personal information exposure risk if the risk score meets a confidentiality threshold, wherein the personal information exposure risk is indicative of the web page including personally identifiable information. |
地址 |
Mountain View CA US |