摘要 |
In networked communication systems, a document in a communication (e.g., a response) may be similar between multiple communications involving the same resource, such that duplicate data can be discarded and not stored by a network storage system. Storage of differences in network traffic facilitates compression of storage of network traffic, thereby significantly reducing data storage. Techniques are disclosed for efficient search and retrieval of the compressed data storage. Network traffic may be compared to communications in previous network traffic to identify differences if any. Resource templates may be generated for different (e.g., new) resources identified in network traffic. Storage of the different resources identified in network traffic enables compression of network traffic. Similarity matching may be implemented to improve processing performance for compact storage of network traffic, including determining differences in network traffic for storage. |
主权项 |
1. A computer-implemented method for compact storage of network communication, the method comprising:
receiving, by a computer system, one or more data packets comprising a communication transmitted by a server computer, the communication including a resource requested by a client computer system; parsing, by the computer system, based on one or more delimiters, the requested resource to identify a plurality of data items in the requested resource; generating, by the computer system, a first set of hash values for the plurality of data items, wherein the first set of hash values is generated based on applying one or more hashing algorithms to the plurality of data items; retrieving, by the computer system, one or more stored templates, each of the one or more stored templates including different content; determining, by the computer system, a second set of hash values for each of the one or more stored templates; for each stored template of the one or more stored templates, performing, by the computer system, a comparison of the first set of hash values to the second set of hash values corresponding to each stored template; computing, by the computer system, a similarity value based on the comparison; upon determining that the similarity value indicates that the first set of hash values is not similar to the second set of hash values for a first stored template, generating, by the computer system, an edit log using the plurality of data items and the first stored template, wherein the edit log identifies differences between the plurality of data items of the requested resource and the first stored template; and storing, by the computer system, the edit log in a data store. |