发明名称 Document classification with weighted supervised n-gram embedding
摘要 Methods and systems for document classification include embedding n-grams from an input text in a latent space, embedding the input text in the latent space based on the embedded n-grams and weighting said n-grams according to spatial evidence of the respective n-grams in the input text, classifying the document along one or more axes, and adjusting weights used to weight the n-grams based on the output of the classifying step.
申请公布号 US8892488(B2) 申请公布日期 2014.11.18
申请号 US201213483868 申请日期 2012.05.30
申请人 NEC Laboratories America, Inc. 发明人 Qi Yanjun;Bai Bing
分类号 G06F17/00;G06N5/00;G06F17/27 主分类号 G06F17/00
代理机构 代理人 Kolodka Joseph
主权项 1. A method for document classification, comprising: embedding n-grams from an input text in a latent space; embedding the input text in the latent space based on the embedded n-grams and weighting the n-grams according to a non-linear functionqj=1Q⁢∑k=1K⁢sigmoid⁡(ak·jN+bk),using a mixture model on a relative position of the n-grams in the input text, where ak and bk are parameters to be learned,Q=∑j=1N⁢qjand K specify a number of mixture quantities, sigmoid (•) is a non-linear transfer function, qj is the weight associated with a jth n-gram, j signifies the position of an n-gram in the input text, and N is the position of a final n-gram in the input text; classifying the document along one or more axes using a processor; and adjusting weights used to weight the n-grams based on the output of the classifying step.
地址 Princeton NJ US