发明名称 Creating taxonomies and training data in multiple languages
摘要 The problem of creating of taxonomies of objects, particularly objects that can be represented as text in various languages, and categorizing such objects is addressed by a method for taking the training documents generated in a first language, translating it to a target language, and then generating from a plurality of training documents one or more sets of features representing one or more categories in the target language. The method includes the steps of: forming a first list of items such that each item in the first list represents a particular training document having an association with one or more elements related to a particular category; developing a second list from the first list by deleting one or more candidate documents which satisfy at least one deletion criterion; translating the documents in the second list from the source language to the target language, and extracting the one or more sets of features from the translated second list using one or more feature selection criteria.
申请公布号 US2004122660(A1) 申请公布日期 2004.06.24
申请号 US20020324919 申请日期 2002.12.20
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 CHENG KEH-SHIN FU;GATES STEPHEN C.
分类号 G06F17/27;G06F17/28;(IPC1-7):G06F17/28 主分类号 G06F17/27
代理机构 代理人
主权项
地址