发明名称 METHOD FOR RECOGNIZING LARGE-SCALE OBJECTS BASED ON SPARK SYSTEM
摘要 A method for recognizing large-scale objects based on a Spark system. The method comprises: step 10. reading and parsing all matching rules (10); step 20. reading and parsing a record that serves as object description data (20); step 30. with regard to each matching rule, if a record has all attributes required for the matching rule, a matching result being an attribute string consisting of the content of all the attributes of the record and a record id of the record (30); step 40. gathering record ids that correspond to the same attribute string together to form a set of record ids, and identifying the same object with the set of record ids (40); step 50. broadcasting the objects of the record id that each object has, and performing transitive closure processing on the objects corresponding to the same record id to obtain new objects (50); and step 60. repeating step 50 until the number of the objects does not change (60). By adopting a large-scale parallel strategy, the problem of matching efficiency in the face of mass data is solved; and by means of a predefined matching rule, the problem of data missing and errors is avoid.
申请公布号 WO2016119508(A1) 申请公布日期 2016.08.04
申请号 WO2015CN94377 申请日期 2015.11.12
申请人 SHENZHEN AUDAQUE DATA TECHNOLOGY LTD 发明人 WANG, MINGXING;WU, YINGHUI;MA, SHUAI;TANG, NAN;JIA, XIBEI
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址