发明名称 Demographic inference calibration
摘要 Methods, systems, and apparatus include computer programs encoded on a computer-readable storage medium for labeling user identifiers. A method includes: identifying a set of unlabeled identifiers, wherein an unlabeled identifier has an unknown classification as to a particular class in a multi-class demographic characteristic; determining for each unlabeled identifier a probability as to inclusion in a class of the multi-class demographic characteristic based on known user behavior producing a distribution of probabilities for the unlabeled identifier; for a given unlabeled identifier, adjusting the probability based on a known internet distribution of entities with respect to a given class in the multi-class demographic characteristic and distribution of the probabilities among the unlabeled identifiers; and assigning a label for a particular class in the multi-class demographic characteristic to the unlabeled identifier in accordance with the adjusting.
申请公布号 US9466029(B1) 申请公布日期 2016.10.11
申请号 US201314054196 申请日期 2013.10.15
申请人 Google Inc. 发明人 Huang Ruoyun;Asuncion Arthur;Sheng Yong
分类号 G06N7/00;G06N5/04 主分类号 G06N7/00
代理机构 Fish & Richardson P.C. 代理人 Fish & Richardson P.C.
主权项 1. A method comprising: identifying a set of unlabeled identifiers associated with a particular user, wherein an unlabeled identifier has an unknown classification as to a particular class in a multi-class demographic characteristic; for each unlabeled identifier of the set of unlabeled identifiers, producing a distribution of probabilities for the unlabeled identifier, including determining, for the unlabeled identifier, a probability of inclusion of the unlabeled identifier in the particular class of the multi-class demographic characteristic based on known behavior of the particular user; ranking, for the particular class, the set of unlabeled identifiers based on the probability for each unlabeled identifier of the set of unlabeled identifiers; determining an internet distribution of users with respect to the multi-class demographic characteristic based on characteristics of a current population survey of users; based on the internet distribution of users with respect the multi-class demographic characteristic, determining a percentage of the users that are in each class of the multi-class demographic characteristic; based on the percentage distribution of the users that are in each class of the multi-class demographic characteristic, defining, in the ranking of the set of unlabeled identifiers, a boundary between the rankings that are associated with each of two or more classes of the multi-class demographic characteristic, the two or more classes including the particular class; for each unlabeled identifier of the set of unlabeled identifiers, adjusting the probability of inclusion in the particular class of the multi-class demographic characteristic based on the boundary; and assigning a label of the particular class in the multi-class demographic characteristic to one or more of the unlabeled identifiers in accordance with the adjusting.
地址 Mountain View CA US