发明名称 |
Table boundary detection in data blocks for compression |
摘要 |
Data is converted into a minimized data representation using a suffix tree by sorting data streams according to symbolic representations for building table boundary formation patterns. The converted data is fully reversible for reconstruction while retaining minimal header information. A scanning operation is performed by searching a suffix of each of the sorted data streams for identifying a data sequence that includes a first symbol representing textual data, and a second symbol representing numerical data. The suffix tree for the converted data is then built. |
申请公布号 |
US9514178(B2) |
申请公布日期 |
2016.12.06 |
申请号 |
US201514694287 |
申请日期 |
2015.04.23 |
申请人 |
INTERNATIONAL BUSINESS MACHINES CORPORATION |
发明人 |
Amit Jonathan;Demidov Lilia;Halowani Nir |
分类号 |
G06F7/00;G06F17/30;H03M7/30 |
主分类号 |
G06F7/00 |
代理机构 |
Griffiths & Seaton PLLC |
代理人 |
Griffiths & Seaton PLLC |
主权项 |
1. A method of identifying table boundaries in data blocks for compression by a processor device in a computing environment, the method comprising:
converting data into a minimized data representation using a suffix tree by sorting data streams according to a plurality of symbolic representations for building table boundary formation patterns, wherein the converted data is fully reversible for reconstruction while retaining minimal header information; and performing a scanning operation according to each of the following: searching a suffix of each of the sorted data streams for identifying a data sequence that includes a first symbol representing textual data and a second symbol representing numerical data, skipping a data that only includes a third symbol until identifying a next data sequence that includes the first and the second symbol representing the textual and the numerical data, and building the suffix tree for the converted data. |
地址 |
Armonk NY US |