发明名称 System and method for using a combination of semantic and statistical processing of input strings or other data content
摘要 A system and method for using a combination of semantic and statistical processing of input strings or other data content, such as a web page or an electronic document. In accordance with an embodiment, the system enables the injection of semantics into an otherwise statistically-based environment, by recognizing that, within various topics, certain words, combinations of words, or phrases, herein referred to as keyphrases have different weights. Some keyphrases may be relatively unique within a particular topic, or have a relatively high weighting towards that topic; whereas other keyphrases may not be unique, or may have a relatively low rating toward that topic. In accordance with an embodiment, the system allows for characterization of both (a) “sufficient” and (b) “necessary” keyphrases. A keyphrase is considered sufficient for a particular topic when, if that keyphrase is found in the input string or data content, one is likely to be in that topic (but could be in another topic). A keyphrase is considered necessary for a particular topic when, if that keyphrase is found in the input string or data content, one is both very likely to be in that topic, and very unlikely to be in any other topic. This information can be used as part of the input processing.
申请公布号 US9110986(B2) 申请公布日期 2015.08.18
申请号 US201113019213 申请日期 2011.02.01
申请人 VEXIGO, LTD. 发明人 Fuchs Gil
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Soroker-Agmon 代理人 Soroker-Agmon
主权项 1. A system that uses a combination of semantic and statistical processing of input or data content, comprising: a system including a computer having a processor and memory that receives an input in the form of a user-entered search query, a set of text retrieved by an automated robot process, web page, electronic document, or some other form of input; a semantically-enhanced statistical lookup data, which is created by analysis of a plurality of documents on various topics, to determine sufficient and necessary keyphrases, wherein a keyphrase is considered sufficient for a particular topic when if that keyphrase is found in the input, the input is likely to be in that topic, and a keyphrase is considered necessary for a particular topic when, if that keyphrase is found in the input, the input is both very likely to be in that topic, and very unlikely to be in any other topic; a semantically-enhanced comparison logic which uses the information in the semantically-enhanced statistical lookup data to analyze the input, compare search words in the input with keyphrases, determine an appropriate topic, and generate an appropriate output; and wherein keyphrases have an entropy associated therewith, wherein keyphrases that are sufficient for a particular topic have a relatively lower entropy than keyphrases that are sufficient for several topics, and a relatively higher entropy than keyphrases that are necessary for the particular topic; and wherein for a particular topic, the set of necessary keyphrases are a subset of sufficient keyphrases for that topic.
地址 Nes Ziona IL