发明名称 Method and system for analyzing text
摘要 An apparatus for providing a control input signal for an industrial process or technical system having one or more controllable elements includes elements for generating a semantic space for a text corpus, and elements for generating a norm from one or more reference words or texts, the or each reference word or text being associated with a defined respective value on a scale, and the norm being calculated as a reference point or set of reference points in the semantic space for the or each reference word or text with its associated respective scale value. Elements for reading at least one target word included in the text corpus, elements for predicting a value of a variable associated with the target word based on the semantic space and the norm, and elements for providing the predicted value in a control input signal to the industrial process or technical system. A method for predicting a value of a variable associated with a target word is also disclosed together with an associated system and computer readable medium.
申请公布号 US9292491(B2) 申请公布日期 2016.03.22
申请号 US201414303651 申请日期 2014.06.13
申请人 STROSSLE INTERNATIONAL AB 发明人 Sikstrom Sverker;Tyrberg Mattias;Hall Anders;Horte Fredrik;Stenberg Joakim
分类号 G06F17/27;G06F17/30;G06Q10/04 主分类号 G06F17/27
代理机构 Young & Thompson 代理人 Young & Thompson
主权项 1. A method for predicting a value of a variable associated with a target word or set of words, performed by an apparatus comprising at least one computer and comprising the steps of: the apparatus collecting a text corpus comprising a set of words that include the target word, the apparatus generating a representation of the text corpus, the at least one computer creating a semantic space for the set of words, based on the representation of the text corpus, the at least one computer defining, for a location in the semantic space, a value of the variable, the at least one computer estimating, for the target word, a value of the variable, based on the semantic space and the defined variable value of the location in the semantic space, calculating, by the at least one computer, a predicted value of the target word, on basis of the semantic space, the defined variable value of the location in the semantic space and the estimated variable value of the target word, and statistically testing if two sets of words or two sets of documents of the text corpora differ in semantic representation, wherein the step of statistically testing comprises: i) calculating a first vector to represent a mean location in the semantic space for a first of the two sets of words or documents; ii) calculating a second vector to represent a mean location in the semantic space for a second of the two sets of words or documents; iii) calculating a distance between the first and second vectors; iv) repeating the steps i), ii), and iii) above while assigning the words randomly to the first of the two sets of words or documents and to the second of the two sets of words or documents; v) counting a percentage of occasions when the distance for the randomly assigned words is larger than when the distance is based on the non-randomly assigned words; and vi) providing the counted percentage as a probability for whether the two sets of words or documents differ in semantic representation.
地址 Stockholm SE