摘要 |
A system and method for extracting content from unstructured sources is disclosed. The method includes analyzing web pages of a website, storing text and image data for each web page of the website, extracting a plurality of entities from the web page data, scoring each entity of the plurality of entities to provide an overall score for each entity, and defining a product based on the plurality of entities and the overall score for each entity. |