发明名称 METHOD AND SYSTEM FOR EXTRACTING A PRODUCT AND CLASSIFYING TEXT-BASED ELECTRONIC DOCUMENTS
摘要 A system to automatically enhance, tag, classify, categorize, cluster and index products described in unstructured text-based electronic documents. The system and method incorporate the use of text normalization, regular expressions, product number matching rules, text segmentation, entity detection, language models, predictive modeling, hierarchal subspace clustering, formal concept analysis, and a weighted combination of all techniques to detect and infer knowledge extracted from a digital version of raw, unstructured product text. Knowledge extracted and inferred comprises knowledge units including: main conceptual entity, entity text patterns, product language models, and conceptual hierarchies. The extracted knowledge units are utilized to store and index products in a product knowledge database and the products and knowledge units are made available to users via a user interface.
申请公布号 US2015331936(A1) 申请公布日期 2015.11.19
申请号 US201514712683 申请日期 2015.05.14
申请人 ALQADAH Faris 发明人 ALQADAH Faris
分类号 G06F17/30;G06F17/28 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method of using a computer system for extraction of information from unstructured product text, comprising: searching an unstructured product text to identify and extract a product identifier; checking for a match of the product identifier in a database of the system's knowledge; enhancing the product text for further processing; tagging tokens in the product text with different entity tags; mining product concepts and computing a hierarchy of product concepts in the product text; retrievably storing the information extracted from the product text into a database; using a feedback loop to provide improved performance over time; and using a mechanism to interface with the data base via an interface.
地址 San Jose CA US