摘要 |
A system and method for automatically extracting entity reference data from a data resource, which can incrementally mine new reference data tuples from the existing data sources (e.g. data warehouse, web, etc.) with low cost. The system of the invention includes an_entity data parsing means coupled with the data resource, for parsing the entity data within the data resource, to obtain an internal semantic structure of each entity data and generate a feature set from the internal semantic structure; and data extraction means for extracting the reference entity data according to the feature set generated by the entity data parsing means. Further, a survival component may be provided to optimize candidate reference data seeds output from the data extraction means.
|