发明名称 Normalizing electronic communications using a vector having a repeating substring as input for a neural network
摘要 Electronic communications can be normalized using a neural network. For example, a noncanonical communication that includes multiple terms can be received. The noncanonical communication can be preprocessed by (I) generating a vector including multiple characters from a term of the multiple terms; and (II) repeating a substring of the term in the vector such that a last character of the substring is positioned in a last position in the vector. The vector can be transmitted to a neural network configured to receive the vector and generate multiple probabilities based on the vector. A normalized version of the noncanonical communication can be determined using one or more of the multiple probabilities generated by the neural network. Whether the normalized version of the noncanonical communication should be outputted can also be determined using at least one of the multiple probabilities generated by the neural network.
申请公布号 US9595002(B2) 申请公布日期 2017.03.14
申请号 US201615175503 申请日期 2016.06.07
申请人 SAS INSTITUTE INC. 发明人 Leeman-Munk Samuel Paul;Cox James Allen
分类号 G06N3/04;H04W4/00 主分类号 G06N3/04
代理机构 Kilpatrick Townsend & Stockton LLP 代理人 Kilpatrick Townsend & Stockton LLP
主权项 1. A non-transitory computer readable medium comprising program code executable by a processor for causing the processor to: receive an electronic representation of a noncanonical communication, the noncanonical communication including multiple terms; preprocess the noncanonical communication by: generating a vector comprising a plurality of characters from a term of the multiple terms, the vector having a predetermined length greater than a length of the term; andrepeating a substring of the term in the vector such that a last character of the substring is positioned in a last position in the vector, wherein the last character of the substring is the same as the last character in the term; transmit the vector to a neural network comprising at least two bidirectional gated recurrent neural network (BGRNN) layers, the neural network being configured to receive the vector and generate multiple probabilities based on the vector; receive from the neural network the multiple probabilities generated based on the transmitted vector; determine a normalized version of the noncanonical communication using one or more of the multiple probabilities received from the neural network; and determine that the normalized version of the noncanonical communication should be outputted or should not be outputted using at least one of the multiple probabilities received from the neural network.
地址 Cary NC US