发明名称 METHODS, APPARATUS, SYSTEMS AND COMPUTER READABLE MEDIA FOR USE IN KEYWORD EXTRACTION
摘要 In one embodiment, a method includes: receiving data representing a plurality of corpora, each of the plurality of corpora including a set of documents; receiving data representing terms that appear in the corpora; for each one of the terms, determining a plurality of inverse document frequency values each associated with a respective one of the plurality of corpora; receiving data representing a subset of the terms that also appear in a document; for each term in the subset, determining a term frequency for the term in the document; and for each term in the subset, determining, an augmented term frequency-inverse document frequency value based on: (i) the term frequency, and (ii) the plurality of inverse document frequency values that were determined for the term in the subset.
申请公布号 US2015199438(A1) 申请公布日期 2015.07.16
申请号 US201414156093 申请日期 2014.01.15
申请人 Talyansky Roman;Vainer Vitaly;Nathan Eyal;Kossoy Oleg;Khalatov Dmitry 发明人 Talyansky Roman;Vainer Vitaly;Nathan Eyal;Kossoy Oleg;Khalatov Dmitry
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method comprising: receiving, by a processing device, data representing a plurality of corpora, each of the plurality of corpora including a set of documents; receiving, by a processing device, data representing terms that appear in the corpora; for each one of the terms, determining, by a processing device, a plurality of inverse document frequency values each associated with a respective one of the plurality of corpora; receiving, by a processing device, data representing a subset of the terms that also appear in a document; for each term in the subset of the terms, determining, by a processing device, a term frequency for the term in the document; and for each term in the subset of the terms, determining, by a processing device, an augmented term frequency-inverse document frequency value based on: (i) the term frequency, and (ii) the plurality of inverse document frequency values that were determined for the term in the subset of the terms.
地址 Haifa IL