发明名称 SYSTEM AND METHOD FOR INVESTIGATING LARGE AMOUNTS OF DATA
摘要 A data analysis system is proposed for providing fine-grained low latency access to high volume input data from possibly multiple heterogeneous input data sources. The input data is parsed, optionally transformed, indexed, and stored in a horizontally-scalable key-value data repository where it may be accessed using low latency searches. The input data may be compressed into blocks before being stored to minimize storage requirements. The results of searches present input data in its original form. The input data may include access logs, call data records (CDRs), e-mail messages, etc. The system allows a data analyst to efficiently identify information of interest in a very large dynamic data set up to multiple petabytes in size. Once information of interest has been identified, that subset of the large data set can be imported into a dedicated or specialized data analysis system for an additional in-depth investigation and contextual analysis.
申请公布号 US2016085817(A1) 申请公布日期 2016.03.24
申请号 US201514961830 申请日期 2015.12.07
申请人 PALANTIR TECHNOLOGIES, INC. 发明人 STOWE GEOFFREY;FISCHER CHRIS;GEORGE PAUL;BINGHAM ELI;HILL ROSCO
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method comprising: receiving a search parameter with a computer that is configured with an improved search mechanism; deriving, with the computer and the improved search mechanism, a search criterion from the search parameter and using the search criterion to obtain one or more first values from a first-key value family of a key-value data repository stored in a data storage device that is coupled to the computer; obtaining based on the one or more first values, with the computer and the improved search mechanism, one or more compressed values from a second key-value family of the key-value data repository; uncompressing, with the computer and the improved search mechanism, the one or more compressed values to produce one or more uncompressed values; identifying, with the computer and the improved search mechanism, based on the one or more first values to identify one or more portions of the one or more uncompressed values; returning, with the computer and the improved search mechanism, the one or more portions of the one or more uncompressed values as search results.
地址 Palo Alto CA US