发明名称 Distributed feature collection and correlation engine
摘要 A distributed feature collection and correlation engine is provided, Feature extraction comprises obtaining one or more data records; extracting information from the one or more data records based on domain knowledge; transforming the extracted information into a key/value pair comprised of a key K and a value V, wherein the key comprises a feature identifier; and storing the key/value pair in a feature store database if the key/value pair does not already exist in the feature store database using a de-duplication mechanism. Features extracted from data records can be queried by obtaining a feature store database comprised of the extracted features stored as a key/value pair comprised of a key K and a value V, wherein the key comprises a feature identifier; receiving a query comprised of at least one query key; retrieving values from the feature store database that match the query key; and returning one or more retrieved key/value pairs.
申请公布号 US9495420(B2) 申请公布日期 2016.11.15
申请号 US201313899784 申请日期 2013.05.22
申请人 International Business Machines Corporation 发明人 Christodorescu Mihai;Hu Xin;Schales Douglas Lee;Sailer Reiner;Stoecklin Marc P.;Wang Ting
分类号 G06F7/00;G06F17/00;G06F17/30 主分类号 G06F7/00
代理机构 Ryan, Mason & Lewis, LLP 代理人 Ryan, Mason & Lewis, LLP
主权项 1. A data processing method, comprising: obtaining one or more data records; extracting feature information from said one or more data records, wherein the extracting is performed based on domain knowledge; transforming, using at least one processing device, said extracted feature information into a transformed key/value pair comprised of a key and a value, wherein said key comprises a feature identifier of said extracted feature information; and storing, using at least one processing device, said transformed key/value pair in a given bucket of values in a feature store database comprised of a plurality of buckets of values only if said key/value pair does not already exist in said feature store database using a de-duplication mechanism by determining if said value of said transformed key/value pair is already in said given bucket, wherein said given bucket is identified by said key comprising said feature identifier of said transformed key/value pair, wherein said bucket of values comprise a mathematical set that stores a given value based on a timestamp of said given value and without regard to an order in which said values are written to said bucket, wherein said storing step further comprises the steps of using the key to look up a record in said feature store database and, if the lookup fails, determining that the key and the value are new and writing a new record to the feature store database keyed with the key and a value.
地址 Armonk NY US