发明名称 Managing replicated data
摘要 An approach for managing replicated data is presented. Metadata is received specifying inter-data correlation(s), inter-replica correlation(s), and data-replica correlation(s) among replicas generated for a system. A unified replication metadata model specifying the correlations is generated. Based on the inter-replica correlation(s), a proper subset of the replicas is selected. Based on the inter-replica and inter-data correlation(s), the selected proper subset of replicas is indexed to generate a unified content index. A query is received to locate a data item in at least one of the replicas. Based on the unified content index, the unified replication metadata model, and the query, candidate replica(s) and confidence score(s) indicating likelihood(s) that the candidate replica(s) include the data item are determined. Based on temporal distance(s) and percent change(s) between first and second replica(s), confidence score(s) of the second replica(s) are determined.
申请公布号 US9110966(B2) 申请公布日期 2015.08.18
申请号 US201414509096 申请日期 2014.10.08
申请人 International Business Machines Corporation 发明人 Brewer Billy S.;Chavda Kavita;Mandagere Nagapramod S.;Routray Ramani R.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Schmeiser, Olsen & Watts 代理人 Schmeiser, Olsen & Watts ;Chung Matthew
主权项 1. A method of managing replicated data, the method comprising the steps of: a computer receiving first metadata specifying inter-data correlation(s), which are correlation(s) between sets of replicated data in a first set of replicas; the computer receiving second metadata specifying inter-data correlation(s), which are correlation(s) between replicas included in a second set of replicas; the computer receiving third metadata specifying data-replica correlation(s), which are correlation(s) between set(s) of replicated data and respective replica(s) included in a third set of replicas, the first, second and third sets of replicas being included in a plurality of replicas generated for a system; the computer generating a unified replication metadata model specifying the inter-data correlation(s) based on the first metadata, the inter-replica correlation(s) based on the second metadata, and the data-replica correlation(s) based on the third metadata; based on the inter-replica correlation(s) specified by the unified replication metadata model, the computer selecting a proper subset of replicas included in the plurality of replicas; based on the inter-replica and inter-data correlation(s) specified by the unified replication metadata model, the computer indexing the selected proper subset of replicas to generate a unified content index; the computer receiving a query to locate a data item in at least one replica included in the plurality of replicas; and based on the unified content index, the unified replication metadata model, and the received query, the computer determining candidate replica(s) and corresponding confidence score(s), the confidence score(s) indicating respective likelihood(s) that the candidate replica(s) include the data item, and the candidate replica(s) included in the plurality of replicas, wherein the step of determining the candidate replica(s) and the corresponding confidence score(s) includes the steps of: based on the unified content index, the computer determining first replica(s) included in the proper subset of replicas that are exact matches to the query;for second replica(s) that are not exact matches to the query, the computer determines respective temporal distance(s) and respective percent change(s) in the system between the second replica(s) and the first replica(s) that are exact matches to the query;for the second replica(s) that are not exact matches to the query, the computer identifying respective nearest neighbor(s) as respective first replica(s) having minimum(s) of the respective temporal distance(s) and respective percent change(s);based on the minimum(s) of the temporal distance(s) and percent change(s), the computer determining confidence score(s) of the second replica(s);the computer sorting the second replica(s) based on the confidence score(s) of the second replica(s); andthe computer directing a device to present the sorted second replica(s) to a user.
地址 Armonk NY US