摘要 |
According to one embodiment, approximate named-entity extraction from a dictionary that includes entries is provided, where each of the entries includes one or more words. Words are read from the entries of the dictionary, and network resources are searched to determine a frequency of occurrence of the words on the network resources. In view of the frequency of occurrence of the words located on the network resources, domain relevancy of the words in the entries of the dictionary is determined. A domain repository is created using top-ranked words as determined by the domain relevancy of the words. In view of the domain repository, signatures for both the entries of the dictionary and strings of an input document are computed. The strings of the input document are filtered by comparing the signatures of the strings against the signatures of the entries to identify approximate-match entity names. |