发明名称 METHOD FOR ASSIGNING SEMANTIC INFORMATION TO WORD THROUGH LEARNING USING TEXT CORPUS
摘要 A method includes acquiring a first corpus, including first text of a first sentence including a first word and described in a natural language, and second text of a second sentence including a second word different in meaning from the first word, a second word distribution of the second word being similar to a first word distribution of the first word, acquiring a second corpus including third text of a third sentence, including a third word identical to the first word and/or the second word, a third word distribution of the third word being not similar to the first word distribution, based on an arrangement of a word string in the first corpus and the second corpus, assigning to the first word a first vector representing a meaning of the first word and assigning to the second word a second vector representing a meaning of the second word.
申请公布号 US2016371254(A1) 申请公布日期 2016.12.22
申请号 US201615176114 申请日期 2016.06.07
申请人 Panasonic Intellectual Property Management Co., Ltd. 发明人 YAMAGAMI KATSUYOSHI;USHIO TAKASHI;ISHII YASUNORI
分类号 G06F17/27;G06N3/08;G06N3/04;G06F19/00 主分类号 G06F17/27
代理机构 代理人
主权项 1. A method for generating semantic information, comprising: acquiring a first text corpus, including first text data of a first sentence including a first word and described in a natural language, and second text data of a second sentence including a second word different in meaning from the first word, with a second word distribution indicating types and frequencies of words appearing within a predetermined range prior to and subsequent to the second word being similar to a first word distribution within the predetermined range prior to and subsequent to the first word in the first sentence; acquiring a second text corpus including third text data of a third sentence, including a third word identical to at least one of the first word and the second word, with a third word distribution within the predetermined range prior to and subsequent to the third word being not similar to the first word distribution; in accordance with an arrangement of a word string in the first text corpus and the second text corpus, performing a learning process by assigning to the first word a first vector representing a meaning of the first word in a vector space of predetermined dimensions and by assigning to the second word a second vector representing a meaning of the second word in the vector space; and storing the first vector in association with the first word, and the second vector spaced by a predetermined distance or longer from the first vector in the vector space in association with the second word.
地址 Osaka JP