发明名称 System and method for genetic creation of a rule set for duplicate detection
摘要 Embodiments may generate a population of candidate rules including multiple rule conditions for detecting duplicates, each duplicate representing different sets of item description information that describe a common item. For each candidate rule of the population, embodiments may apply that rule to a reference data set including known duplicates and non-duplicates. Embodiments may assign each candidate rule a fitness score generated with a fitness function based on the performance of that candidate rule. Embodiments may, based on the fitness scores, select a subset of the population of candidate rules as parents for the new generation of candidate rules. Embodiments may perform crossover and/or mutation operations on the parent candidate rules to generate the new generation of candidate rules. Embodiments may select from the new generation of candidate rules (or from subsequent generations of candidate rules), rules for inclusion within a rule set for detecting duplicates within item description information.
申请公布号 US8577814(B1) 申请公布日期 2013.11.05
申请号 US201113193285 申请日期 2011.07.28
申请人 WU JIANHUI;THIRUMALAI SRIKANTH;AMAZON TECHNOLOGIES, INC. 发明人 WU JIANHUI;THIRUMALAI SRIKANTH
分类号 G06F15/18 主分类号 G06F15/18
代理机构 代理人
主权项
地址