发明名称 IDENTIFYING WORD-SENSES BASED ON LINGUISTIC VARIATIONS
摘要 One or more words are received. A set of frequency of occurrence values of the received word(s) within a set of domain tables is determined. A domain table in the set of domain tables is associated to the received word(s), based on the set of frequency of occurrence values meeting a threshold value. A word-sense of the received word(s) is determined based on a corresponding word-sense in the associated domain table and/or corresponding domain dictionary.
申请公布号 US2016371806(A1) 申请公布日期 2016.12.22
申请号 US201615263530 申请日期 2016.09.13
申请人 International Business Machines Corporation 发明人 Bishop Timothy A.;Boxwell Stephen A.;Brumfield Benjamin L.;Desai Nirav P.;Vernier Stanley J.
分类号 G06Q50/22;G06F17/27 主分类号 G06Q50/22
代理机构 代理人
主权项 1. A computer program product for identifying word-senses, the method comprising: a computer-readable storage medium having program code embodied therewith, the program code executable by a processor of a computer to perform a method comprising: generating, by a computer, a plurality of arrays of aggregated statistical information of words, their corresponding word-senses, and temporal properties within different professional fields using an n-gram viewer, wherein the aggregated statistical information comprises frequency of usage of words, frequency of occurrence of words, frequency of co-occurrence of words with other words, and their respective corresponding word-senses; generating, by the computer, a set of domain tables based on the generated plurality of arrays of aggregated statistical information, wherein each of the domain tables within the set of domain tables corresponds to a different professional field comprising medical, veterinary, legal, and engineering; receiving, from a remote server through a network, a digital text stream comprising metadata and one or more words from a doctor, using the computer, the network being an internet connection; selecting, using the metadata, a medical frequency domain table, veterinary frequency domain table, and a word-sense domain table from the set of domain tables; determining a frequency of occurrence value for the received digital text stream within each of the selected domain tables; receiving a threshold from the doctor; associating the medical frequency domain table with the received digital text stream in response to the frequency of occurrence value satisfying the received threshold; determining a word-sense of the received digital text stream, by determining a corresponding word sense to the received digital text stream within the medical frequency domain table; assigning a confidence value to the word-sense based on a degree of frequency of occurrence of the received digital text stream within the medical domain, wherein the word-sense has a higher confidence value, when the frequency of occurrence of the received digital text stream is higher within the medical domain table; and presenting the word-sense and the confidence value to the doctor.
地址 Armonk NY US