摘要 |
This invention relates to a system, method and computer program product for processing large scale unstructured data comprising: a receiver for receiving streamed input data from live data sources; a pattern generator for deriving emergent patterns in data subsets; a pattern identifier for identifying a repeating pattern and corresponding data subset within the emergent patterns; a compressor for reducing the identified data subset and identified pattern to a compressed signature; and a repository for storing the streamed input data with the compressed signature and without the identified data subset wherein the data subset can be rebuilt if necessary using the compressed signature. |