发明名称 |
Distributed feature collection and correlation engine |
摘要 |
A distributed feature collection and correlation engine is provided, Feature extraction comprises obtaining one or more data records; extracting information from the one or more data records based on domain knowledge; transforming the extracted information into a key/value pair comprised of a key K and a value V, wherein the key comprises a feature identifier; and storing the key/value pair in a feature store database if the key/value pair does not already exist in the feature store database using a de-duplication mechanism. Features extracted from data records can be queried by obtaining a feature store database comprised of the extracted features stored as a key/value pair comprised of a key K and a value V, wherein the key comprises a feature identifier; receiving a query comprised of at least one query key; retrieving values from the feature store database that match the query key; and returning one or more retrieved key/value pairs. |
申请公布号 |
US9489426(B2) |
申请公布日期 |
2016.11.08 |
申请号 |
US201313967730 |
申请日期 |
2013.08.15 |
申请人 |
International Business Machines Corporation |
发明人 |
Christodorescu Mihai;Hu Xin;Schales Douglas Lee;Sailer Reiner;Stoecklin Marc P.;Wang Ting |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
Ryan, Mason & Lewis, LLP |
代理人 |
Ryan, Mason & Lewis, LLP |
主权项 |
1. An apparatus for processing data, the apparatus comprising:
a memory; and at least one processing device, coupled to the memory, operative to: obtain one or more data records; extract feature information from said one or more data records, wherein the extracting is performed based on domain knowledge; transform said extracted feature information into a transformed key/value pair comprised of a key and a value, wherein said key comprises a feature identifier of said extracted feature information; store said transformed key/value pair in a given bucket of values in a feature store database comprised of a plurality of buckets of values only if said key/value pair does not already exist in said feature store database using a de-duplication mechanism by determining if said value of said transformed key/value pair is already in said given bucket, wherein said given bucket is identified by said key comprising said feature identifier of said transformed key/value pair; and wherein said bucket of values comprise a mathematical set that stores a given value based on a timestamp of said given value and without regard to an order in which said values are written to said bucket. |
地址 |
Armonk NY US |