发明名称 |
Document classification with weighted supervised n-gram embedding |
摘要 |
Methods and systems for document classification include embedding n-grams from an input text in a latent space, embedding the input text in the latent space based on the embedded n-grams and weighting said n-grams according to spatial evidence of the respective n-grams in the input text, classifying the document along one or more axes, and adjusting weights used to weight the n-grams based on the output of the classifying step. |
申请公布号 |
US8892488(B2) |
申请公布日期 |
2014.11.18 |
申请号 |
US201213483868 |
申请日期 |
2012.05.30 |
申请人 |
NEC Laboratories America, Inc. |
发明人 |
Qi Yanjun;Bai Bing |
分类号 |
G06F17/00;G06N5/00;G06F17/27 |
主分类号 |
G06F17/00 |
代理机构 |
|
代理人 |
Kolodka Joseph |
主权项 |
1. A method for document classification, comprising:
embedding n-grams from an input text in a latent space; embedding the input text in the latent space based on the embedded n-grams and weighting the n-grams according to a non-linear functionqj=1Q∑k=1Ksigmoid(ak·jN+bk),using a mixture model on a relative position of the n-grams in the input text, where ak and bk are parameters to be learned,Q=∑j=1Nqjand K specify a number of mixture quantities, sigmoid (•) is a non-linear transfer function, qj is the weight associated with a jth n-gram, j signifies the position of an n-gram in the input text, and N is the position of a final n-gram in the input text;
classifying the document along one or more axes using a processor; and adjusting weights used to weight the n-grams based on the output of the classifying step. |
地址 |
Princeton NJ US |