发明名称 Sparse Matrix Storage in a Database
摘要 Methods, processes and computer-program products are disclosed for use in a parallelized computing system in which representations of large sparse matrices are efficiently encoded and communicated between grid-computing devices. A sparse matrix can be encoded and stored as a collection of character strings wherein each character string is a Base64 encoded string representing the non-zero elements of a single row of the sparse matrix. On a per-row basis, non-zero elements can be identified by column indices and error correction metadata can be included. The resultant row data can be converted to IEEE 754 8-byte representations and then encoded into Base64 characters for storage as strings. These character strings of even very large-dimensional sparse matrices can be efficiently stored in databases or communicated to grid-computing devices.
申请公布号 US2015242484(A1) 申请公布日期 2015.08.27
申请号 US201514633915 申请日期 2015.02.27
申请人 SAS Institute Inc. 发明人 Zhao Zheng;Cox James Allen;Albright Russell
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A computer-program product comprising a non-transitory machine-readable storage medium that stores instructions operable to cause a data processing apparatus to perform operations including: accessing a representation of a sparse matrix, wherein the sparse matrix includes multiple rows and columns, wherein each of the rows includes multiple zero elements and multiple non-zero elements, wherein each of the non-zero elements is indexable by a row index and a column index, and wherein the representation includes information about each of the non-zero elements and the respective row indices and column indices of the non-zero elements; using the representation of the sparse matrix in performing the following operations with respect to each of the rows of the sparse matrix: form a platform-independent binary representation of each non-zero element of the row;form a platform-independent binary representation of each column index that indexes a non-zero element of the row;form a sequence of bits that represents the row and includes the representations of non-zero elements and the representations of column indices; andform a character string that represents the row, wherein the character string is formed by encoding the sequence of bits using Base64 encoding; and storing or distributively communicating the character strings, wherein storing includes storing the character strings in a database, and wherein distributively communicating the character strings include communicating the character strings to grid-computing devices in a grid-computing system to facilitate parallelized statistical analysis of the sparse matrix.
地址 Cary NC US