发明名称 System and method for intelligently categorizing data to delete specified amounts of data based on selected data characteristics
摘要 A data processing system (DPS) and a computer program product assigns stored documents within a distributed storage system (DSS) to various document categories to enable a target number of documents to be deleted. An intelligent storage management (ISM) utility identifies a data storage threshold value used to control data storage within the DSS. If a current storage usage exceeds the data storage threshold value, the ISM utility calculates, based on the current storage usage, a target number of documents that can be deleted from the DSS. The ISM utility utilizes a recursive process which includes assigning stored documents to groups including a set of document categories based on data characteristics of the stored documents. The ISM utility further utilizes the recursive process to delete, based on an established ordering of the groups, all of the stored documents assigned to a subset of the groups in order to remove the target number of stored documents.
申请公布号 US9355118(B2) 申请公布日期 2016.05.31
申请号 US201314081181 申请日期 2013.11.15
申请人 International Business Machines Corporation 发明人 Joseph Dinakaran;Nadgir Devaprasad Khandurao;Ramalingam Ramkumar;Shepard David Elliot
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Yudell Isidore PLLC 代理人 Isidore Eustace P.;Yudell Isidore PLLC
主权项 1. A data processing system (DPS) operating within a distributed storage system (DSS), the DPS comprising: at least one processor; a memory system having stored therein a utility, which when executed by the processor causes the processor to provide the functions of: identifying a pre-established data storage threshold value for an amount of data that can be stored within the DSS;tracking a current storage usage for an amount of data stored within the DSS;determining whether the current storage usage exceeds the pre-established data storage threshold value;in response to determining that the current storage usage exceeds the pre-established data storage threshold value, calculating a target number of documents that can be deleted from the DSS based on an amount by which the current storage usage exceeds the pre-established data storage threshold value, wherein the current storage usage is proportional to a number of stored documents in the DSS, wherein said stored documents have a same document type;assigning stored documents to a plurality of groups including a set of document categories based on corresponding data characteristics, wherein said groups are ordered based on a relative index associated with corresponding values for data characteristics, wherein all of the stored documents assigned to at least one of the plurality of groups can be deleted, based on an order associated with the relative index of the document categories, in order to provide the target number of documents that can be deleted; anddeleting all of the stored documents assigned to at least one of the plurality of groups in order to remove the target number of stored documents;determining a data characteristic parameter that can be used to identify parameter values of the stored documents, which parameter values enable the stored documents to be assigned into the plurality of groups;wherein the utility further performs the function of said assigning by: selecting a set of document categories that are associated with the determined data characteristic parameter, wherein the selected set of document categories provide an initial set of groups to which stored documents are assigned;assigning stored documents to the selected set of document categories based on corresponding parameter values associated with the determined data characteristic parameter;determining a maximum number of document categories, from among the selected set of document categories, for which all corresponding stored documents can be removed without exceeding the target number of documents that can be deleted;in response to the maximum number being greater than zero: deleting all of the stored documents corresponding to the maximum number of document categories from among the selected set of document categories;determining whether a count of all the removed stored documents is less than the target number of documents that can be deleted; andin response to the count being less than the target number of stored documents that can be deleted, identifying a document category of remaining, stored documents, which identified document category is adjacent to a document category most recently identified for providing stored documents to be deleted, wherein said identified category is a target document category which comprises corresponding stored documents that can be further categorized; in response to the maximum number being equal to zero, selecting, from among the identified set of document categories, a first category as a target document category having corresponding stored documents that can be further categorized, wherein said first category holds a first position relative to other document categories based on a corresponding range of parameter values; and executing a process to further categorize stored documents from the target document category and delete documents from an associated sub-category to arrive at the target number.
地址 Armonk NY US
您可能感兴趣的专利