发明名称 Table boundary detection in data blocks for compression
摘要 Data is converted into a minimized data representation using a suffix tree by sorting data streams according to symbolic representations for building table boundary formation patterns. The converted data is fully reversible for reconstruction while retaining minimal header information.
申请公布号 US9043293(B2) 申请公布日期 2015.05.26
申请号 US201313789254 申请日期 2013.03.07
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 Amit Jonathan;Demidov Lilia;Halowani Nir
分类号 G06F7/00;G06F17/30;H03M7/30 主分类号 G06F7/00
代理机构 Griffiths & Seaton PLLC 代理人 Griffiths & Seaton PLLC
主权项 1. A method of identifying table boundaries in data blocks for compression by a processor device in a computing environment, the method comprising: converting data into a minimized data representation using a suffix tree by sorting data streams according to a plurality of symbolic representations for building table boundary formation patterns, wherein the converted data is fully reversible for reconstruction while retaining minimal header information, wherein in conjunction with the sorting the data streams according to the plurality of symbolic representations, textual data is represented by a first symbol, numerical data represented with a second symbol, and a delimiters used for separation is represented by a third symbol; and performing a scanning operation according to each of the following: searching a suffix of each of the sorted data streams for identifying a data sequence that includes the first and second symbol representing the textual and numerical data,skipping the data that only includes the third symbol until identifying the next data sequence that includes the first and second symbol representing the textual and numerical data,building the suffix tree for the converted data, andeliminating each scan-order not matching the searching and the skipping.
地址 Armonk NY US