发明名称 Reducing false positives in data validation using statistical heuristics
摘要 To validate data, a plurality of strings that match a predetermined regular expression is extracted from the data. A validated subset of the strings is identified. To determine whether the validated subset has been falsely validated, it is determined whether the validated subset satisfies each of one or more predetermined criteria relative to the plurality of strings. In one embodiment, the subset is determined to be falsely validated if at least one of the criteria is satisfied. In another embodiment, the subset is determined to be falsely validated if all of the criteria are satisfied. The data are released only if the subset is determined to be falsely validated.
申请公布号 US8959047(B2) 申请公布日期 2015.02.17
申请号 US201213468045 申请日期 2012.05.10
申请人 Check Point Software Technologies Ltd. 发明人 Perlmutter Amnon;Ganon Limor;Dahan Meir Jonathan
分类号 G06N5/00;G06F1/00 主分类号 G06N5/00
代理机构 代理人 Friedman Mark M.
主权项 1. A method of validating data comprising the steps of: (a) extracting from the data a plurality of strings that match a predetermined regular expression; (b) identifying a validated subset of said strings; (c) calculating a known probability for a statistical measure of the data based on said predetermined regular expression; (d) comparing said statistical measure of said validated subset to said known probability for said statistical measure; and (e) determining, based on said comparing, if said validated subset meets at least one predetermined criterion relative to all of said plurality of strings taken collectively.
地址 Tel Aviv IL