摘要 |
Methods for providing in-loop validation of disambiguated features are disclosed. The disclosed methods may include disambiguating features in unstructured text that may use co-occurring features derived from both the source document and a large document corpus. The disambiguating systems may include multiple modules, including a linking on-the-fly module for linking the derived features from the source document to the co-occurring features of an existing knowledge base. The system for disambiguating features may allow identifying unique entities from a knowledge base that includes entities with a unique set of co-occurring features, which in turn may allow for increased precision in knowledge discovery and search results, employing advanced analytical methods over a massive corpus, employing a combination of entities, co-occurring entities, topic IDs, and other derived features. The disclosed method may use validation to provide input to the system for disambiguating features. |
主权项 |
1. A method comprising:
receiving, by a first computer, a first search query result from a search conductor, wherein the first search query result is based on a search query and comprises a record matching a field of the search query; sending, by the first computer, the first search query result to a second computer such that the second computer is able to disambiguate the first search query result via a determination of a relatedness among an individual record feature and a topic identification associated with each record in the first search query result, wherein the second computer comprises a main memory storing an in-memory database, wherein the second computer is configured to link disambiguation data, in real-time, as the disambiguation data is requested by the first computer from the second computer; receiving, by the first computer, a second search query result from the second computer, wherein the second search query result has been disambiguated via the second computer; sending, by the first computer, the second search query result to a third computer such that the third computer is able to receive an input on the second search query result; generating, by the first computer, a new feature occurrence record in a knowledge base database, wherein the new feature occurrence record includes the input, wherein the in-memory database comprises the knowledge base database; and placing, by the first computer, a request that the new feature occurrence record be stored in the knowledge base database such that the second computer is able to adjust a parameter of a disambiguation algorithm based on the input, wherein the disambiguation algorithm involves linking via the second computer. |