发明名称 Analyzing real-time streams of time-series data
摘要 A computer-implemented method of analyzing a plurality of metrics associated with one or more real-time streams of time-series data. For each metric, a set of time-series data for an interval of time is received, and one or more feature values determined from data of the stream prior to the interval of time are retrieved. One or more updated feature values are determined using the one or more retrieved feature values and the received set of time-series data, and the one or more updated feature values are stored in the store. The values of the metrics are then determined from the updated feature values, the plurality of metrics are then analyzed, for example to perform anomaly detection.
申请公布号 US9509710(B1) 申请公布日期 2016.11.29
申请号 US201514950027 申请日期 2015.11.24
申请人 International Business Machines Corporation 发明人 Barry Paul M.;Manning Ian;Udaltsova Natalia
分类号 G06F11/00;H04L29/06;G06F17/30 主分类号 G06F11/00
代理机构 代理人 Johnson Erik K.
主权项 1. A method for analyzing, in real-time, a plurality of data streams including time-series data, the method comprising: receiving, by a first computer connected to a second computer, a first data stream of the plurality of data streams, including time-series data and one or more metrics, for a first time interval, wherein the time-series data is generated by a monitoring service operating on the second computer configured to monitor operations performed by hardware and software components of one or more other computers, including at least the first computer, and wherein each of the one or more metrics is a performance statistic of the monitored operations; responsive to receiving the time-series data at the first computer, retrieving, by the first computer from a computer readable storage medium, one or more feature values for each of the one or more metrics, wherein the one or more feature values were previously produced during a time interval immediately preceding the first time interval, including a streaming mean for the preceding time interval μn; producing, by the first computer, for each of the one or more metrics, one or more updated feature values including a streaming mean for the first time interval μn+1, based on the received time-series data xn+1, the retrieved streaming mean μn, and a number of time intervals n preceding the first time interval n+1, at least according to the relationship:μn+1=μn+(xn+1-μn)(n+1); storing, by the first computer, for each of the one or more metrics, the one or more updated feature values including the produced streaming mean μn+1 in the computer readable storage medium by overwriting the one or more retrieved feature values; determining, by the first computer, a value for each of the one or more metrics, based on the one or more updated feature values including the streaming mean μn+1; responsive to determining the value for each of the one or more metrics, grouping, by the first computer, each of the one or more metrics into a plurality of groups, wherein each group of the plurality of groups includes the one or more metrics having values within a specified range; for each group of the plurality of groups: calculating, by the first computer, an average value, based on the values of the one or more metrics included in a respective group; identifying, by the first computer, a subgroup of correlated metrics, based on a correlation approximation for values of the one or more metrics included in the respective group; comparing, by the first computer, the values of the one or more correlated metrics included in the subgroup with the average value of the respective group; determining, by the first computer, whether the value of one of the one or more correlated metrics included in the subgroup diverge from the average value of the respective group above a correlated threshold, wherein a divergence from the average value of the respective group above the correlated threshold indicates one or more operational anomalies performed by the hardware and the software components; and transmitting, by the first computer, a notification including information for the one or more operational anomalies to the second computer.
地址 Armonk NY US