发明名称 System, methods, and data structure for quantitative assessment of symbolic associations
摘要 A system and methods and data structure for quantitatively assessing associations between terms or symbols in natural language and non-natural language contents. Some of the terms represent objects or concepts or semantic attributes; some other terms represent properties associated with the objects or concepts or attributes. The methods include obtaining a first group of text contents, specifying a target term or symbol, and identifying contextual attributes of the target term or symbol. The contextual attributes include grammatical and semantic attributes as well as positional and distance attributes. Association strength values are calculated for related terms or symbols based on the contextual attributes of the target term or symbol, and terms or symbols are selected to represent the properties of an object or concept or attribute.
申请公布号 US9262395(B1) 申请公布日期 2016.02.16
申请号 US201313742337 申请日期 2013.01.15
申请人 发明人 Zhang Guangsheng;Zhang Chizhong
分类号 G06F17/27 主分类号 G06F17/27
代理机构 代理人
主权项 1. A method implemented on a computing device comprising one or more processors, the method comprising: receiving a first term as a name or description representing an object, wherein the object includes a physical or conceptual object, a topic, or an attribute associated with one or more objects, wherein the first term is received from a source including manual input from a user, or automatic input from a computing device, for automatically gathering information or knowledge about the object represented by the first term from unstructured data sources using a machine-based method; receiving a first group of text units comprising at least two words, or one or more phrases or sentences or paragraphs or documents, wherein at least half of the text units contain the first term or are from contents that contain the first term, and at least half of the text units contain one or more unspecified second terms each being different from the first term; for one or more second terms in the first group of text units, producing a cumulative value based at least on the number of text units that contain both the first term and the second term; producing a first score value based at least on dividing the cumulative value by at least half of the total number of the text units that contain the first term or are from contents that contain the first term; selecting one or more of the second terms based on the first score value; assembling the selected terms into a term set; attaching the term set to the first term to form a dataset, wherein the function of the selected terms includes representing terms associated with the first term, or representing properties associated with the object, or representing information about the object with information that is automatically gathered from unstructured text contents by using a machine-based method; and outputting the dataset as a form of information representation or knowledge representation for a specific object represented by the first term.
地址