发明名称 Extending relational algebra for data management
摘要 Methods are provided for improving the ability to apply modeling techniques similar to relational algebra to an expanded number of workflows. By allowing a relational algebra type modeling technique to be applied to an expanded number of workflows, an increased number of data processing workflows can be more readily improved, such as by automatic modification of the sequence of tasks in a workflow, to reduce the execution costs for a workflow. The relational algebra type modeling technique can also allow for identification of portions of data processing workflows or queries that share a common input and output.
申请公布号 US9558240(B2) 申请公布日期 2017.01.31
申请号 US201414298651 申请日期 2014.06.06
申请人 MICROSOFT TECHNOLOGY LICENSING, LLC 发明人 Yan An;Luo Jing;Luo Yi;Li Nan
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人 Meyers Jessica;Wong Tom;Minhas Micky
主权项 1. A computer-implemented method for managing a distributed database, comprising: receiving a query defining a first workflow, the first workflow corresponding to tasks for processing data from one or more data sources, the query having a corresponding first logical expression comprising a sequence of operators from an expanded relational algebra, at least one operator in the first logical expression being an instance of a reduce operator; modifying an ordering of the operators in the first logical expression to form a modified first logical expression, a beginning sequence of the modified first logical expression having an increased number of operators in common with a beginning sequence of a second logical expression of a second workflow relative to a beginning sequence of the first logical expression, the modifying of the ordering of the operators in the first logical expression comprising a) moving one or more operators from a position prior to the instance of the reduce operator to a position after the instance of the reduce operator, b) moving one or more operators from a position after the instance of the reduce operator to a position prior to the instance of the reduce operator, or c) a combination thereof; constructing a modified first workflow corresponding to the modified first logical expression; and executing at least a portion of the modified first workflow using an intermediate data set formed by execution of a portion of the second workflow, the executed portion of the modified first workflow corresponding to operators in the sequence for the modified first logical expression located after the operators in common with the second logical expression; wherein the modified first workflow has a lower execution cost than an execution cost of the first workflow.
地址 Redmond WA US