发明名称 |
Identifying cultural background from text |
摘要 |
<p>Diaculture of text can be determined or analyzed by tokenizing words of the text according to a rule set to generate tokenized text, the rule set defining: a first set of grammatical types of words, which are words that are replaced with tokens that respectively indicate a grammatical type of a respective word, and a second set of grammatical types of words, which are words that are passed as tokens without changing. N-grams can be constructed from the tokenized text, each n-gram including one or more of consecutive tokens from the tokenized text. The n-grams can be compared to a training data set that corresponds to a known diaculture to obtain a comparison result that indicates how well the text matches the training data set for the known diaculture.</p> |
申请公布号 |
EP2645272(A1) |
申请公布日期 |
2013.10.02 |
申请号 |
EP20130161708 |
申请日期 |
2013.03.28 |
申请人 |
LOCKHEED MARTIN CORPORATION |
发明人 |
TAYLOR, SARAH M.;DAVENPORT, DANIEL;MENAKER, DAVID M.;PARADIS, ROSEMARY D. |
分类号 |
G06F17/27 |
主分类号 |
G06F17/27 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|