发明名称 APPROXIMATE NAMED-ENTITY EXTRACTION
摘要 According to one embodiment, approximate named-entity extraction from a dictionary that includes entries is provided, where each of the entries includes one or more words. Words are read from the entries of the dictionary, and network resources are searched to determine a frequency of occurrence of the words on the network resources. In view of the frequency of occurrence of the words located on the network resources, domain relevancy of the words in the entries of the dictionary is determined. A domain repository is created using top-ranked words as determined by the domain relevancy of the words. In view of the domain repository, signatures for both the entries of the dictionary and strings of an input document are computed. The strings of the input document are filtered by comparing the signatures of the strings against the signatures of the entries to identify approximate-match entity names.
申请公布号 US2014163958(A1) 申请公布日期 2014.06.12
申请号 US201213711746 申请日期 2012.12.12
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 Chen Ying;Spangler William S.;Yan Su
分类号 G06F17/28 主分类号 G06F17/28
代理机构 代理人
主权项
地址 Armonk NY US