发明名称 Storing tokenized information in untrusted environments
摘要 Techniques are described for tokenizing information to be stored in an untrusted environment. During tokenization, one or more strings in a file or data stream are replaced with a token. The token may be generated as a random number or a counter, such that the replaced string may not be derived based on the token. Token-to-string mapping data may be stored in a trusted environment, and the tokenized information may be stored in the untrusted environment. Users may search the tokenized information based on non-sensitive search terms present in a whitelist that is accessible from the untrusted environment, the whitelist providing a token-to-string mapping for the non-sensitive terms. The search results may be provided as redacted information, in which the non-sensitive strings have been detokenized based on the whitelist while the sensitive strings remain tokenized.
申请公布号 US9081978(B1) 申请公布日期 2015.07.14
申请号 US201313905815 申请日期 2013.05.30
申请人 Amazon Technologies, Inc. 发明人 Connolly Jeremiah John;Marinus Dennis
分类号 G06F21/00;G06F21/62;G06F21/10;H04L29/06 主分类号 G06F21/00
代理机构 Lindauer Law, PLLC 代理人 Lindauer Law, PLLC
主权项 1. A computer-implemented method, comprising: in a trusted computing environment, parsing a file to determine a plurality of words included in the file, based on whitespace characters that separate the words in the file, the file comprising one or more sensitive words corresponding to financial account data; for individual words that are unique in the plurality of words, determining a corresponding token that corresponds to the word, such that the word is not derivable from the token; generating a tokenized file that includes corresponding tokens in place of the plurality of words; storing the tokenized file in an untrusted computing environment; in the trusted computing environment, storing a mapping of the plurality of words to the corresponding tokens; and in the untrusted computing environment: storing a whitelist mapping of a subset of the plurality of words to the corresponding tokens, the subset including non-sensitive words other than the one or more sensitive words;receiving a search request including one or more search terms;for the one or more search terms that are included in the whitelist, retrieving the corresponding token;for the one or more search terms that are not included in the whitelist, sending a request that the trusted computing environment retrieve the corresponding token;based at least in part on one or more tokens corresponding to the one or more search terms, perform a search of the tokenized file stored in the untrusted computing environment;identifying one or more tokens in the tokenized file that are included in the whitelist;replacing the identified one or more tokens with one or more corresponding words from the whitelist, to generate partly detokenized information; and providing the partly detokenized information in response to the search request.
地址 Reno NV US