发明名称 Managing replicated data
摘要 An approach for managing replicated data is presented. Metadata is received specifying inter-data correlation(s), inter-replica correlation(s), and data-replica correlation(s) among replicas generated for a system. A unified replication metadata model specifying the correlations is generated. Based on the inter-replica correlation(s), a proper subset of the replicas is selected. Based on the inter-replica and inter-data correlation(s), the selected proper subset of replicas is indexed to generate a unified content index. A query is received to locate a data item in at least one of the replicas. Based on the unified content index, the unified replication metadata model, and the query, candidate replica(s) and corresponding confidence score(s) are determined. The confidence score(s) indicate respective likelihood(s) that the candidate replica(s) include the data item.
申请公布号 US8898113(B2) 申请公布日期 2014.11.25
申请号 US201213683370 申请日期 2012.11.21
申请人 International Business Machines Corporation 发明人 Brewer Billy S.;Chavda Kavita;Mandagere Nagapramod S.;Routray Ramani R.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Schmeiser, Olsen & Watts 代理人 Schmeiser, Olsen & Watts ;Chung Matthew H.
主权项 1. A method of managing replicated data, the method comprising the steps of: a computer receiving first metadata specifying inter-data correlation(s), which are correlation(s) between sets of replicated data in a first set of replicas; the computer receiving second metadata specifying inter-replica correlation(s), which are correlation(s) between replicas included in a second set of replicas; the computer receiving third metadata specifying data-replica correlation(s), which are correlation(s) between set(s) of replicated data and respective replica(s) included in a third set of replicas, the first, second and third sets of replicas being included in a plurality of replicas generated for a system; the computer determining a current usage of resources in the system and a threshold usage of the resources; the computer generating a unified replication metadata model specifying the inter-data correlation(s) based on the first metadata, the inter-replica correlation(s) based on the second metadata, and the data-replica correlation(s) based on the third metadata; based on the inter-replica correlation(s) specified by the unified replication metadata model, the computer selecting a proper subset of replicas included in the plurality of replicas; based on the inter-replica and inter-data correlation(s) specified by the unified replication metadata model, the computer indexing the selected proper subset of replicas to generate a unified content index, wherein the step of indexing the selected proper subset of replicas includes the steps of: the computer determining index updates by determining keyword-to-replica mappings; andthe computer generating the unified content index based on the index updates, wherein the step of determining the index updates includes the steps of:the computer determining index expectation scores and resource affinity scores for respective replicas in the selected proper subset of replicas; andthe computer sorting the selected proper subset of replicas based on the respective index expectation scores and the respective resource affinity scores, and wherein the step of determining the resource affinity scores for respective replicas in the selected proper subset of replicas includes the steps of: if the current usage is less than the threshold usage, then the computer determining an expected additional resource usage due to performing an indexing task online, and based on the expected additional resource usage, the computer determining a resource affinity score for performing the indexing task online; andif the current usage is greater than or equal to the threshold usage, then the computer determining an expected resource usage due to performing the indexing task offline and based on the expected resource usage, the computer determining a resource affinity score for performing the indexing task offline; the computer receiving a query to locate a data item in at least one replica included in the plurality of replicas; and based on the unified content index, the unified replication metadata model, and the received query, the computer determining candidate replica(s) and corresponding confidence score(s), the confidence score(s) indicating respective likelihood(s) that the candidate replica(s) include the data item, and the candidate replica(s) included in the plurality of replicas.
地址 Armonk NY US