摘要 |
A system is disclosed for formulating structure descriptions from data. In some embodiments, data arrives with an unknown format. The data may be ad hoc data that is considered semi-structured. Disclosed embodiments analyze chunks of the data to determine tokens. Tokens are analyzed to identify base types and compound types such as structs, unions, and arrays. Descriptions are generated and undergo scoring and rewriting for optimization. The generated descriptions may be fed to a data description language such as Processing Ad Hoc Data System (PADS) and compiled for processing the raw data. In some embodiments, the raw data is parsed, printed, or reformatted using the generated descriptions.
|