发明名称 System for organizing and fast searching of massive amounts of data
摘要 A system to collect and store in a special data structure arranged for rapid searching massive amounts of data. Performance metric data is one example. The performance metric data is recorded in time-series measurements, converted into unicode, and arranged into a special data structure having one directory for every day which stores all the metric data collected that day. The data structure at the server where analysis is done has a subdirectory for every resource type. Each subdirectory contains text files of performance metric data values measured for attributes in a group of attributes to which said text file is dedicated. Each attribute has its own section and the performance metric data values are recorded in time series as unicode hex numbers as a comma delimited list. Analysis of the performance metric data is done using regular expressions.
申请公布号 US9396287(B1) 申请公布日期 2016.07.19
申请号 US201313853925 申请日期 2013.03.29
申请人 CUMULUS SYSTEMS, INC. 发明人 Bhave Ajit;Ramachandran Arun;Nadimpalli Sai Krishnam Raju;Bele Sandeep
分类号 G06F17/30;G06F17/22 主分类号 G06F17/30
代理机构 代理人 Fish Ronald C.
主权项 1. A server having a memory storing a non-relational database file system created and maintained by a NRDB access manager, said NRDB access manager coupled to a query request processor, said NRDB access manager receiving performance metric data and configuration data gathered by probes coupled to resources being monitored and encoding at least said performance metric data into Unicode and storing said Unicode encoded performance metric data and configuration data in files in said non-relational database file system, said server coupled to receive web requests containing search queries via a web request controller, said web request controller forwarding said web requests to said query request processor, said server also coupled to a result cache where performance metric data located as a result of processing said search queries by said query request processor is stored, and wherein said query request processor is programmed to carry out the following functions to process said search queries so as to do searches on said encoded performance metric data using regular expressions: A) parse said search query, where said search query can include one or more searches on one or more nested levels, and wherein each search can include a filter or matching condition which is a regular expression, and wherein all regular expressions have rules of syntax which must be obeyed; B) if said search query has an invalid format, update said result cache in said server with an error and terminate the search process; C) if said search query has a proper format, making a request through said NRDB access manager to retrieve from said non-relational database file system all relevant data which can include encoded performance metric data and/or configuration data or event data for all instances of resources having a resource type which matches a resource type specified in a first search on a first level of nesting said search query; D) create a thread pool to process said retrieved relevant data from each resource instance of a type identified in said search query as said first search on said first level, each thread in said thread pool processing said retrieved relevant data from one of the resource instances of the type specified in said first search of said first level of nesting, and wherein said server is programmed to maintain an application properties file which contains a field configured to store a number that defines how many said threads are created; E) if said search query includes one or more searches having filter or matching conditions which can include filter or matching conditions that are to be applied to performance metric data or configuration data or event data, applying, in each thread process, a first filter of a first level of nesting expressed in said search query to said retrieved relevant data being processed in said thread, and selecting all instances of said resource type identified in said first search on said first level of nesting which have data which matches said filter or matching condition of said first search on said first level of nesting, and, if any searches remain on said first level of nesting in said search query, applying each additional filter of each additional search on said first level of nesting in each said thread process sequentially to said retrieved relevant data only for instances of said resource type named in said first search of said first level of nesting which were selected by a match with the previous filter or matching condition(s) of the next previous search on said first level, until all searches on said first level of nesting of said search query have been executed; F) in each thread, determining if said retrieved relevant data of said resource instance qualifies by meeting all the filter or matching criteria of all said searches on said first level of nesting in said search query; G) if, in any thread, said retrieved relevant data of said resource instance does not qualify in any search on said first level of nesting, discarding said retrieved relevant data; H) in each thread, determining if said search query specifies a sub path to another level of nesting where searching is to be performed on performance metric data or configuration data or event data of instances of a sub resource type which is related to said resource type specified in said first search on said first level which were qualified by the last search on said first level, and, if a sub path is specified in said search query, making a request to said NRDB access manager to access all relevant data needed for one or more searches specified in said search query to be performed on said nesting level specified in said sub path from an instance of said sub resource named in said first search specified for said nesting level specified by said sub path and which is related to an instance of said resource type named in the first search on said first level; I) in each said thread process, applying the first filter of said first search specified in said search query for said nesting level specified by said sub path to the relevant encoded performance metric data or configuration data or event data of said sub resource specified in said first search on said nesting level specified by said sub path, and selecting said instance of said sub resource being processed in said thread as qualified if said encoded performance metric data or said configuration data or said event data of said sub resource meets a filter or matching condition of said first search on said nesting level pointed to by said sub path, and, if any further searches are specified in said search query for said level of nesting pointed to by said sub path, applying each additional search filter or matching criteriasequentially to relevant performance metric data or configuration data or event data identified in said search being processed of said instance of said sub resource being processed in said thread if said instance of said sub resource was qualified by the preceding search on said level of nesting pointed to by said sub path in said thread process, and continuing to execute searches on said nesting level pointed to by said sub path until all said searches of said search query for said nesting level pointed to by said sub path have been executed; J) in each thread, determining if said sub resource qualified in all searches on said nesting level pointed to by said sub path, and, if not, discarding all said relevant data from said sub resource, and, if said sub resource qualified in all searches on said nesting level pointed to by said sub path, determining if there is another sub path to yet another nesting level specified in said search query; K) in each thread, if there is another sub path to another nesting level where a sub resource is specified in said search query which is related to said sub resource identified in said first search on the previous nesting level, accessing relevant performance metric data or configuration data or event data specified in said first search on said nesting level pointed to by said sub path from an instance of said sub resource type being processed which is related to said instance of said sub resource type processed on the preceding nesting level and which qualified per the filter or matching condition specified in a last search on said preceding nesting level, and applying a filter or matching criteria of a first search on said nesting level being processed to said relevant data accessed in this step K, and, if more than one search is specified by said search query for said nesting level being processes, applying each filter or matching condition specified in said searches specified in said search query for said nesting level being processed sequentially to the said retrieved relevant data accessed in this step K of said sub resource being processed in said thread process if said sub resource qualified by a match with the previous search's filter or matching condition, until all searches specified in said search query for said nesting level being processed have been executed; L) repeating processing like the processing previously described for steps H, I, J and K until all searches on all sub paths to further levels of nesting have been applied sequentially to all related sub resources specified in said search query for all nesting levels specified in said search query; M) determining in each thread process if the top level said instance of said resource type named in said first search of said first level qualified, and, if said instance of said resource type named in said first search on said first level did not qualify in any search on said first nesting level, discarding said retrieved relevant data accessed by said thread for said first nesting level; and N) in each thread, if said instance of said resource type qualified all filter or matching conditions of all searches on said first nesting level, adding at least the instance of said resource type named in said first search on said first nesting level which qualified as well as the encoded performance metric data or other data that qualified to said result cache and adding any related sub resource instance which was qualified by the filter or matching condition of all searches on any sub path nesting level to said result cache along with said retrieved relevant data which qualified.
地址 Mountain View CA US