发明名称 System and method for identifying structured data items lacking requisite information for rule-based duplicate detection
摘要 Embodiments of a system and method for identifying structured data items lacking requisite information for rule-based duplicate detection are described. Embodiments may include generating a deficiency score for each of multiple structured data items including applying a set of rules based on duplicate detection techniques to each given structured data item in order to perform a comparison of the given structured data item to itself. The deficiency score of the given structured data item may be based on a result of the comparison. Embodiments may also include, based on the deficiency scores of the structured data items, identifying one or more deficient structured data items having less than a requisite quantity of information for performing duplicate detection on structured data items. Embodiments may also include identifying one or more key attributes missing from some of the one or more deficient structured data items and requesting those key attributes.
申请公布号 US8527475(B1) 申请公布日期 2013.09.03
申请号 US201113239068 申请日期 2011.09.21
申请人 RAMMOHAN ROSHAN RAM;KURUP MADHU M;THIRUMALAI SRIKANTH;AMAZON TECHNOLOGIES, INC. 发明人 RAMMOHAN ROSHAN RAM;KURUP MADHU M;THIRUMALAI SRIKANTH
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址