发明名称 Identifying cultural background from text
摘要 <p>Diaculture of text can be determined or analyzed by tokenizing words of the text according to a rule set to generate tokenized text, the rule set defining: a first set of grammatical types of words, which are words that are replaced with tokens that respectively indicate a grammatical type of a respective word, and a second set of grammatical types of words, which are words that are passed as tokens without changing. N-grams can be constructed from the tokenized text, each n-gram including one or more of consecutive tokens from the tokenized text. The n-grams can be compared to a training data set that corresponds to a known diaculture to obtain a comparison result that indicates how well the text matches the training data set for the known diaculture.</p>
申请公布号 EP2645272(A1) 申请公布日期 2013.10.02
申请号 EP20130161708 申请日期 2013.03.28
申请人 LOCKHEED MARTIN CORPORATION 发明人 TAYLOR, SARAH M.;DAVENPORT, DANIEL;MENAKER, DAVID M.;PARADIS, ROSEMARY D.
分类号 G06F17/27 主分类号 G06F17/27
代理机构 代理人
主权项
地址