发明名称 Methods and Systems for Improved Semantic Meshing
摘要 In at least one embodiment, the present invention provides methods and systems for improved semantic meshing, comprising receiving an input data stream consisting of a plurality of characters; generating a normalized stream having an initial value based on said input data stream; applying a plural character rolling window to a subset of the normalized stream to select at least one stream subset, applying a first uniform hash function to the at least one stream subset to create at least one digest, identifying a cut if the modulus of the digest is zero, such that identifying a cut includes applying a second uniform hash function to the remainder values of the normalized stream, generating at least one shingle, resetting the plural character rolling window with a plurality of zeros and aggregating the at least one shingle into a semantic hash.
申请公布号 US2016371310(A1) 申请公布日期 2016.12.22
申请号 US201615189961 申请日期 2016.06.22
申请人 Carcema Inc. ;6899005 Canada Inc. 发明人 Turner Joshua
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method for improved semantic meshing, comprising the steps of: receiving at least one first input data stream consisting of a plurality of characters and corresponding to at least a first document retrieved from a database and at least one second input data stream consisting of a plurality of characters and corresponding to at least a second document retrieved from a database; generating at least one first normalized stream having an initial value based on the at least one first input data stream and at least one second normalized stream having an initial value based on the at least one second input data stream; applying a plural character rolling window to a subset of the at least one first normalized stream and the at least one second normalized stream to select at least one stream subset from each of the at least one first normalized stream and the at least one second normalized stream, each at least one stream subset having a plurality of characters and an initial at least one stream subset value and a last at least one stream subset value, applying a first uniform hash function to each at least one stream subset to create at least one first digest and at least one second digest; and determining a modulus of the at least one first digest and the at least one second digest using a scaling factor, the scaling factor selected based on a size of at least one of the at least one first normalized stream and the at least one second normalized stream; identifying a cut when the modulus is determined to be equal to zero; and applying a second uniform hash function to at least one set of remainder values of the at least one first normalized stream and the at least one second normalized stream, the at least one set of remainder values including the characters of at least one of the at least one first normalized stream and the at least one second normalized stream extending from the initial value of the at least one first normalized stream or the at least one second normalized stream to the respective initial of the at least one stream subset value; generating at least one shingle from the output of the second uniform hash function; resetting the plural character rolling window with a plurality of zeros; appending the at least one shingle into at least one semantic hash; and storing the semantic hash in the database.
地址 Gatineau CA