发明名称 Systems and methods for detecting missing data in query results
摘要 Techniques provided herein allow for estimating data missing in query results provided in response to queries performed on data managed by a data management system. In the event that one or more leaf nodes are unable or unavailable to process a query, a final query result provided in response to the original query may be missing data that exists on those leaf nodes. A data accounting service monitors what managed data is being stored on the leaf nodes and on what leaf node. The data accounting service can estimate how much data is missing from a final query result when one or more of the leaf nodes are unable or unavailable to process a query.
申请公布号 US9501521(B2) 申请公布日期 2016.11.22
申请号 US201313951438 申请日期 2013.07.25
申请人 Facebook, Inc. 发明人 Barykin Oleksandr;Metzler Josh
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Sheppard Mullin Richter & Hampton LLP 代理人 Sheppard Mullin Richter & Hampton LLP
主权项 1. A computer system comprising: at least one processor; and a memory storing instructions configured to instruct the at least one processor to perform: receiving a data set for storage; storing a data subset of the data set at a set of leaf nodes of a plurality of leaf nodes; storing data accounting information at the set of leaf nodes, wherein the data accounting information tracks data being stored at the set of leaf nodes, wherein the data accounting information includes one or more identifiers that correspond to the data subset being stored; receiving an initial query configured to be performed on the data set; submitting a first query on the data set to the set of leaf nodes, wherein the first query is based on the initial query; receiving a respective first result and a respective second result in response to the first query from at least a portion of leaf nodes in the set of leaf nodes, wherein the second result is based on a second query performed on the data accounting information determined based at least in part on one or more respective identifiers that correspond to the data included in the first result, the respective second result providing data accounting info that indicates the amount of data stored in the set of leaf node based on the identifier; aggregating the respective first results that were received from the portion of leaf nodes to determine a final result; aggregating the respective second results that were received from the set of leaf nodes; and determining an estimate for an amount of data missing based on the aggregated second result and the final result the portion of leaf nodes to determine an estimate for an amount of data missing from the final result.
地址 Menlo Park CA US