发明名称 A process for extracting keywords
摘要 <p>A database of texts is provided; lexical elements (Wi) of the texts are listed in a dictionary, together with a number (NWi) representative of the frequency of these elements in the texts; sequences (Sj) of the lexical elements in the texts are listed in a collection. Then, the sequences of the collection are segmented into elements of the dictionary. Concatenations (Ck) of the elements of the dictionary forming the segmented sequences are listed; a number (NCk) representative of the frequency of each concatenation in the database of texts is computed. Based on the number (NCk) associated to a concatenation (Ck) of the collection of concatenation and on the numbers (NWi) associated in the dictionary to the elements forming said concatenation, one computes a prediction of the variation of the sum of the costs of all elements of the dictionary if the concatenation were to be added to the dictionary; the cost of an element is computed as a decreasing and entropic function of the number (NWi) associated to this element. If the sum of costs decreases, then the concatenation is added to the dictionary, else it is ignored. The process may be iterated for adding more elements to the dictionary; each iteration adds entries made of existing entries of the dictionary, thus forming hierarchical or structured keywords. The process extracts from the database of texts combinations of lexical elements that are meaningful in the texts. &lt;IMAGE&gt;</p>
申请公布号 EP1258815(A1) 申请公布日期 2002.11.20
申请号 EP20010401273 申请日期 2001.05.16
申请人 EXENTIS 发明人 BOURDONCLE, FRANCOIS;LAGUNAS, FRANCOIS
分类号 G06F17/27;(IPC1-7):G06F17/30 主分类号 G06F17/27
代理机构 代理人
主权项
地址