发明名称 |
METHODS, APPARATUS, SYSTEMS AND COMPUTER READABLE MEDIA FOR USE IN KEYWORD EXTRACTION |
摘要 |
In one embodiment, a method includes: receiving data representing a plurality of corpora, each of the plurality of corpora including a set of documents; receiving data representing terms that appear in the corpora; for each one of the terms, determining a plurality of inverse document frequency values each associated with a respective one of the plurality of corpora; receiving data representing a subset of the terms that also appear in a document; for each term in the subset, determining a term frequency for the term in the document; and for each term in the subset, determining, an augmented term frequency-inverse document frequency value based on: (i) the term frequency, and (ii) the plurality of inverse document frequency values that were determined for the term in the subset. |
申请公布号 |
US2015199438(A1) |
申请公布日期 |
2015.07.16 |
申请号 |
US201414156093 |
申请日期 |
2014.01.15 |
申请人 |
Talyansky Roman;Vainer Vitaly;Nathan Eyal;Kossoy Oleg;Khalatov Dmitry |
发明人 |
Talyansky Roman;Vainer Vitaly;Nathan Eyal;Kossoy Oleg;Khalatov Dmitry |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
1. A method comprising:
receiving, by a processing device, data representing a plurality of corpora, each of the plurality of corpora including a set of documents; receiving, by a processing device, data representing terms that appear in the corpora; for each one of the terms, determining, by a processing device, a plurality of inverse document frequency values each associated with a respective one of the plurality of corpora; receiving, by a processing device, data representing a subset of the terms that also appear in a document; for each term in the subset of the terms, determining, by a processing device, a term frequency for the term in the document; and for each term in the subset of the terms, determining, by a processing device, an augmented term frequency-inverse document frequency value based on: (i) the term frequency, and (ii) the plurality of inverse document frequency values that were determined for the term in the subset of the terms. |
地址 |
Haifa IL |