发明名称 |
TABLE BOUNDARY DETECTION IN DATA BLOCKS FOR COMPRESSION |
摘要 |
Data is converted into a minimized data representation using a suffix tree by sorting data streams according to symbolic representations for building table boundary formation patterns. The converted data is fully reversible for reconstruction while retaining minimal header information. A scanning operation is performed by searching a suffix of each of the sorted data streams for identifying a data sequence that includes a first symbol representing textual data, and a second symbol representing numerical data. The suffix tree for the converted data is then built. |
申请公布号 |
US2015379068(A1) |
申请公布日期 |
2015.12.31 |
申请号 |
US201514847478 |
申请日期 |
2015.09.08 |
申请人 |
INTERNATIONAL BUSINESS MACHINES CORPORATION |
发明人 |
AMIT Jonathan;DEMIDOV Lilia;HALOWANI Nir |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
1. A system for identifying table boundaries in data blocks for compression in a computing environment, the system comprising:
a processor device, operable in the computing environment, wherein the processor device:
converts data into a minimized data representation using a suffix tree by sorting data streams according to a plurality of symbolic representations for building table boundary formation patterns, wherein the converted data is fully reversible for reconstruction while retaining minimal header information; andperforms a scanning operation according to each of the following:
searches a suffix of each of the sorted data streams for identifying a data sequence that includes a first symbol representing textual data and a second symbol representing numerical data, andbuilds the suffix tree for the converted data. |
地址 |
Armonk NY US |