发明名称 OVERPARTITIONING SYSTEM AND METHOD FOR INCREASING CHECKPOINTS IN COMPONENT-BASED PARALLEL APPLICATIONS
摘要 Two methods for partitioning the work to be done by a computer program into smaller pieces so that checkpoints may be done more frequently. Initially, a parallel task starts with one or more input data sets having q initial partitions, divides the input data sets into p partitions by some combination of partitioning elements (i.e., partitioners/gatherers), runs an instance of a component program on each of the p partitions of the data, and produces one or more sets of output files, with each set being considered a partitioned data set. The invention is applied to such a task to create a new, "overpartitioned" task as follows: (1) the partitioner is replaced with an "overpartitioner" which divides its q inputs into n*p partitions, for some integer factor n; (2) the component program is run in a series of n execution phases, with p instances of the component program being run at any time. In each phase, each instance of the component program will read one overpartition of the input data and produce one partition of output data; (3) at the end of each of the n execution phases, the system is quiescent and may be checkpointed. A first embodiment explicitly overpartitions input data by using known partitioner programs, communication channels, and gatherer programs to produce overpartitioned intermediate files. The second embodiment dynamically overpartitions input data by arranging for the component programs to consecutively read contiguous subsets of the original input data.
申请公布号 PT954781(E) 申请公布日期 2007.05.31
申请号 PT19960943688T 申请日期 1996.12.11
申请人 AB INITIO SOFTWARE CORPORATION 发明人 ROBERT LORDI;CRAIG STANFILL;CLIFF LASSER
分类号 G06F11/00;G06F11/14;G06F9/46;G06F9/50;G06F11/08;G06F11/267 主分类号 G06F11/00
代理机构 代理人
主权项
地址