发明名称 MANAGING DATA PROFILING OPERATIONS RELATED TO DATA TYPE
摘要 Processing data in a computing system includes receiving a plurality of records that each have one or more values for respective fields of a plurality of fields. Data type information associates each of one or more data types with at least one identifier. Processing a plurality of data values from the records includes: generating a plurality of data units from the records, each data unit including a field identifier that uniquely identifies one of the fields and a binary value from one of the records, the binary value extracted from the field of that record identified by the field identifier; aggregating information about binary values from a plurality of the data units; generating a list of entries for each of one or more of the fields, at least some of the entries each including one of the binary values and information about that binary value aggregated from a plurality of the data units; retrieving a data type associated with a first identifier from the data type information, and associating the retrieved data type with at least one binary value included in an entry of one of the lists; and generating profile information for at least one of the fields based at least in part on a retrieved data type of a particular binary value appearing in the field, after aggregating information about binary values from a plurality of the data units.
申请公布号 US2015254292(A1) 申请公布日期 2015.09.10
申请号 US201514625902 申请日期 2015.02.19
申请人 Ab Initio Technology LLC 发明人 Khan Muhammad Arshad
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method for processing data in a computing system, the method including: receiving, over an input device or port of the computing system, a plurality of records that each have one or more values for respective fields of a plurality of fields; storing, in a storage medium of the computing system, data type information that associates each of one or more data types with at least one identifier; and processing, using at least one processor of the computing system, a plurality of data values from the records, the processing including: generating a plurality of data units from the records, each data unit including a field identifier that uniquely identifies one of the fields and a binary value from one of the records, the binary value extracted from the field of that record identified by the field identifier;aggregating information about binary values from a plurality of the data units;generating a list of entries for each of one or more of the fields, at least some of the entries each including one of the binary values and information about that binary value aggregated from a plurality of the data units;retrieving a data type associated with a first identifier from the data type information, and associating the retrieved data type with at least one binary value included in an entry of one of the lists; andgenerating profile information for at least one of the fields based at least in part on a retrieved data type of a particular binary value appearing in the field, after aggregating information about binary values from a plurality of the data units.
地址 Lexington MA US