摘要 |
A method for learning a data format is disclosed including but not limited to inputting an initial description of a data format and a batch of data comprising data in a new data format not covered by the initial description, instructions to use the first description to parse the records in the data source; discarding records in the input data that parse successfully, instructions to collect records that fail to parse, instructions to accumulate a quantity, M of records that fail to parse, returning a modified description that extends the initial description to cover the new data, transforming the first description, D into a second description D′to accommodate differences between the input data format and the first description D by introducing options where a piece of data was missing in the input data and introducing unions where a new type of data was found in the input data. |