发明名称 DOMAIN-SPECIFIC COMPUTATIONAL LEXICON FORMATION
摘要 According to an aspect, a candidate token sequence including one or more word tokens is extracted from an unstructured domain glossary that includes entries associated with a domain. A look-up operation is performed to retrieve language data for each word token in the candidate token sequence and annotates each word token in the candidate token sequence found by the look-up operation with corresponding retrieved language data to form an annotated sequence. A pattern match of the annotated sequence is performed relative to a repository of patterns and identifies a best matching pattern from the repository of patterns to the annotated sequence based on matching criteria. The annotated sequence is refined with lexical information associated with the best matching pattern as a refined annotated sequence. The candidate token sequence and the refined annotated sequence are output to a domain-specific computational lexicon file.
申请公布号 US2016179782(A1) 申请公布日期 2016.06.23
申请号 US201414580583 申请日期 2014.12.23
申请人 International Business Machines Corporation 发明人 Boguraev Branimir K.;Manandise Esme;Segal Benjamin P.
分类号 G06F17/27;G06F17/24;G06F17/28 主分类号 G06F17/27
代理机构 代理人
主权项
地址 Armonk NY US