发明名称 |
Optimizing sparse schema-less data in relational stores |
摘要 |
Various embodiments of the invention relate to optimizing storage of schema-less data. A schema-less dataset including a plurality of resources is received. Each resource is associated with at least a plurality of properties. At least one set of co-occurring properties from the plurality of properties is identified. A graph including a plurality of nodes is generated. Each of the nodes represents a unique property in the set of co-occurring properties. The graph further includes an edge connecting each node representing a pair of co-occurring properties. A graph coloring operation is performed on the graph. The graph coloring operation includes assigning each of nodes to a color, where nodes connected by an edge are assigned different colors. A schema is generated that assigns a column identifier from a table to each unique property represented by one of the nodes in the graph based on the color assigned to the node. |
申请公布号 |
US8918434(B2) |
申请公布日期 |
2014.12.23 |
申请号 |
US201213454559 |
申请日期 |
2012.04.24 |
申请人 |
International Business Machines Corporation |
发明人 |
Bhattacharjee Bishwaranjan;Bornea Mihaela Ancuta;Dantressangle Patrick;Dolby Julian;Srinivas Kavitha;Udrea Octavian |
分类号 |
G06F17/30;G06F7/00 |
主分类号 |
G06F17/30 |
代理机构 |
Fleit Gibbons Gutman Bongini & Bianco PL |
代理人 |
Grzesik Thomas;Fleit Gibbons Gutman Bongini & Bianco PL |
主权项 |
1. A method for optimizing storage of schema-less data in a data storage system, the method comprising:
receiving a schema-less dataset comprising a plurality of resources, wherein each resource in the plurality of resources is associated with at least a plurality of properties; identifying, for one or more of the plurality of resources, at least one set of co-occurring properties from the plurality of properties, wherein two or more properties in the plurality of properties co-occur if at least one resource in the plurality of resources comprises each of the two or more properties; generating a graph comprising a plurality of nodes, wherein each of the plurality of nodes represents a unique property in the at least one set of co-occurring properties, and wherein the graph further comprises an edge connecting each of the plurality of nodes representing a pair of co-occurring properties in the at least one set of co-occurring properties, wherein generating the graph further comprises: identifying a first property in the at least one set of co-occurring properties that is associated with a higher priority than a second property in the at least one set of co-occurring properties; and adding a node to the graph representing the first property prior to adding a node to the graph representing the second property; performing a graph coloring operation on the graph, wherein the graph coloring operation comprises assigning each of the plurality of nodes to a label, wherein nodes connected by an edge are assigned different labels, and wherein the label assigned to each of the plurality of nodes corresponds to a column identifier from a table; and generating a storage schema, wherein the storage schema assigns a column identifier from the table to each unique property represented by one of the plurality of nodes in the graph based on the label assigned to the node. |
地址 |
Armonk NY US |