发明名称 Domain specific natural language normalization
摘要 <p>Embodiments of the present invention provide a method, system and computer program product for the domain specific normalization of a corpus of text. In an embodiment of the invention, a method for domain specific normalization of a corpus of text is provided, including an industrial, organization, demographic or geographic domain. The method includes loading a corpus of text in memory of a computer and determining a domain for the corpus of text. The method also includes retrieving a lexicon of replacement words for the determined domain. Finally, the method includes text simplifying the corpus of text using the retrieved lexicon. In one aspect of the embodiment, the domain is determined through inference based upon words already presence in the corpus of text. In another aspect of the embodiment, the domain is determined based upon meta-data provided with the corpus of text.</p>
申请公布号 GB201302916(D0) 申请公布日期 2013.04.03
申请号 GB20130002916 申请日期 2013.02.20
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人
分类号 主分类号
代理机构 代理人
主权项
地址