主权项 |
1. A system for detecting anomalies in data dynamically received from a plurality of sensors associated with one or more machines, the system comprising:
a knowledgebase; a model store; one or more interfaces configured to receive data from the plurality of sensors; and processing resources including at least one processor and a memory, the processing resources being configured, for each instance of data received via the one or more interfaces, to at least:
classify, using a model retrieved from the model store, the respective instance as being one of a normal instance type and an anomalous instance type, the retrieved model being selected from the model store as being appropriate for the machine that produced the data in the respective instance if such a model exists in the model store;in response to a classification of the respective instance being a normal instance type, use the data in the respective instance to train the retrieved model;in response to a classification of the respective instance being an anomalous instance type that is not new, determine from the knowledgebase an action to be taken and take the determined action; andin response to a classification of the respective instance being an anomalous instance type that is new, seek confirmation from an authorized user as to whether the respective instance should be designated as a confirmed new anomalous instance type, and:
responsive to confirmation from the authorized user that the respective instance is a new anomalous instance type, update the knowledgebase with information about the respective instance and/or an action to be taken should the new anomalous instance type be detected again; anduse the data in the respective instance to train the retrieved model; wherein each model in the model store is implemented using a k-means cluster algorithm modified so as to (a) be continually trainable as a result of the dynamic reception of data over an unknown and potentially indefinite time period, and (b) build clusters incrementally and in connection with an updatable distance threshold that indicates when a new cluster is to be created; and wherein each said model has a respective total number of clusters that is dynamic and learned over time. |