发明名称 Searchable archive
摘要 An apparatus, computer-readable medium, and computer-implemented method for generating a searchable archive, the method including receiving a set of tabular data comprising a plurality of rows, storing data corresponding to a group of rows in the plurality of rows in a compacted file, the compacted file comprising one or more compressed segments. The compressed segments can store data corresponding to a portion of the rows in the group of rows and can store the data corresponding to the group of rows in column-major order. The compressed segments can store one or more token values corresponding to one or more data values in the set of tabular data and the token values can be generated by dividing the set of tabular data into columns and assigning a different token to each unique data value within each of the columns.
申请公布号 US8799229(B2) 申请公布日期 2014.08.05
申请号 US201213725430 申请日期 2012.12.21
申请人 Informatica Corporation 发明人 Grondin Richard;Fadeitchev Evgueni;Zarouba Vassili
分类号 G06F7/00;G06F17/00 主分类号 G06F7/00
代理机构 Reed Smith LLP 代理人 Kaufman Marc S.;Grewal Amardeep S.;Reed Smith LLP
主权项 1. A computer-implemented method for generating a searchable archive executed by one or more computing devices, the method comprising: receiving, by at least one of the one or more computing devices, one or more data values; determining, by at least one of the one or more computing devices, one or more domains associated with the one or more data values, wherein each of the one or more data values is associated with a corresponding domain in the one or more domains; generating, by at least one of the one or more computing devices, a domain structure for the one or more domains, wherein the domain structure identifies which of the one or more data values correspond to each domain in the one or more domains; generating, by at least one of the one or more computing devices, one or more token columns from the one or more data values, wherein each token column corresponds to a domain in the one or more domains, and each unique token in the token column corresponds to a unique data value in the corresponding domain; creating, by at least one of the one or more computing devices, one or more compressed token column segments from the one or more token columns; and generating, by at least one of the one or more computing devices, one or more compacted files from the one or more compressed token column segments.
地址 Redwood City CA US