发明名称 IDENTIFYING CULTURAL BACKGROUND FROM TEXT
摘要 Diaculture of text can be determined or analyzed by tokenizing words of the text according to a rule set to generate tokenized text, the rule set defining: a first set of grammatical types of words, which are words that are replaced with tokens that respectively indicate a grammatical type of a respective word, and a second set of grammatical types of words, which are words that are passed as tokens without changing. Grams can be constructed from the tokenized text, each gram including one or more of consecutive tokens from the tokenized text. The grams can be compared to a training data set that corresponds to a known diaculture to obtain a comparison result that indicates how well the text matches the training data set for the known diaculture.
申请公布号 US2013282362(A1) 申请公布日期 2013.10.24
申请号 US201313852620 申请日期 2013.03.28
申请人 LOCKHEED MARTIN CORPORATION;LOCKHEED MARTIN CORPORATION 发明人 TAYLOR SARAH M.;DAVENPORT DANIEL M.;MENAKER DAVID;PARADIS ROSEMARY D.
分类号 G06F17/28 主分类号 G06F17/28
代理机构 代理人
主权项
地址