发明名称 Optimizing sparse schema-less data in relational stores
摘要 Various embodiments of the invention relate to optimizing storage of schema-less data. A schema-less dataset including a plurality of resources is received. Each resource is associated with at least a plurality of properties. At least one set of co-occurring properties from the plurality of properties is identified. A graph including a plurality of nodes is generated. Each of the nodes represents a unique property in the set of co-occurring properties. The graph further includes an edge connecting each node representing a pair of co-occurring properties. A graph coloring operation is performed on the graph. The graph coloring operation includes assigning each of nodes to a color, where nodes connected by an edge are assigned different colors. A schema is generated that assigns a column identifier from a table to each unique property represented by one of the nodes in the graph based on the color assigned to the node.
申请公布号 US8918434(B2) 申请公布日期 2014.12.23
申请号 US201213454559 申请日期 2012.04.24
申请人 International Business Machines Corporation 发明人 Bhattacharjee Bishwaranjan;Bornea Mihaela Ancuta;Dantressangle Patrick;Dolby Julian;Srinivas Kavitha;Udrea Octavian
分类号 G06F17/30;G06F7/00 主分类号 G06F17/30
代理机构 Fleit Gibbons Gutman Bongini & Bianco PL 代理人 Grzesik Thomas;Fleit Gibbons Gutman Bongini & Bianco PL
主权项 1. A method for optimizing storage of schema-less data in a data storage system, the method comprising: receiving a schema-less dataset comprising a plurality of resources, wherein each resource in the plurality of resources is associated with at least a plurality of properties; identifying, for one or more of the plurality of resources, at least one set of co-occurring properties from the plurality of properties, wherein two or more properties in the plurality of properties co-occur if at least one resource in the plurality of resources comprises each of the two or more properties; generating a graph comprising a plurality of nodes, wherein each of the plurality of nodes represents a unique property in the at least one set of co-occurring properties, and wherein the graph further comprises an edge connecting each of the plurality of nodes representing a pair of co-occurring properties in the at least one set of co-occurring properties, wherein generating the graph further comprises: identifying a first property in the at least one set of co-occurring properties that is associated with a higher priority than a second property in the at least one set of co-occurring properties; and adding a node to the graph representing the first property prior to adding a node to the graph representing the second property; performing a graph coloring operation on the graph, wherein the graph coloring operation comprises assigning each of the plurality of nodes to a label, wherein nodes connected by an edge are assigned different labels, and wherein the label assigned to each of the plurality of nodes corresponds to a column identifier from a table; and generating a storage schema, wherein the storage schema assigns a column identifier from the table to each unique property represented by one of the plurality of nodes in the graph based on the label assigned to the node.
地址 Armonk NY US