发明名称 Distributed feature collection and correlation engine
摘要 A distributed feature collection and correlation engine is provided, Feature extraction comprises obtaining one or more data records; extracting information from the one or more data records based on domain knowledge; transforming the extracted information into a key/value pair comprised of a key K and a value V, wherein the key comprises a feature identifier; and storing the key/value pair in a feature store database if the key/value pair does not already exist in the feature store database using a de-duplication mechanism. Features extracted from data records can be queried by obtaining a feature store database comprised of the extracted features stored as a key/value pair comprised of a key K and a value V, wherein the key comprises a feature identifier; receiving a query comprised of at least one query key; retrieving values from the feature store database that match the query key; and returning one or more retrieved key/value pairs.
申请公布号 US9489426(B2) 申请公布日期 2016.11.08
申请号 US201313967730 申请日期 2013.08.15
申请人 International Business Machines Corporation 发明人 Christodorescu Mihai;Hu Xin;Schales Douglas Lee;Sailer Reiner;Stoecklin Marc P.;Wang Ting
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Ryan, Mason & Lewis, LLP 代理人 Ryan, Mason & Lewis, LLP
主权项 1. An apparatus for processing data, the apparatus comprising: a memory; and at least one processing device, coupled to the memory, operative to: obtain one or more data records; extract feature information from said one or more data records, wherein the extracting is performed based on domain knowledge; transform said extracted feature information into a transformed key/value pair comprised of a key and a value, wherein said key comprises a feature identifier of said extracted feature information; store said transformed key/value pair in a given bucket of values in a feature store database comprised of a plurality of buckets of values only if said key/value pair does not already exist in said feature store database using a de-duplication mechanism by determining if said value of said transformed key/value pair is already in said given bucket, wherein said given bucket is identified by said key comprising said feature identifier of said transformed key/value pair; and wherein said bucket of values comprise a mathematical set that stores a given value based on a timestamp of said given value and without regard to an order in which said values are written to said bucket.
地址 Armonk NY US