发明名称 |
METHOD FOR ASSIGNING SEMANTIC INFORMATION TO WORD THROUGH LEARNING USING TEXT CORPUS |
摘要 |
A method includes acquiring a first corpus, including first text of a first sentence including a first word and described in a natural language, and second text of a second sentence including a second word different in meaning from the first word, a second word distribution of the second word being similar to a first word distribution of the first word, acquiring a second corpus including third text of a third sentence, including a third word identical to the first word and/or the second word, a third word distribution of the third word being not similar to the first word distribution, based on an arrangement of a word string in the first corpus and the second corpus, assigning to the first word a first vector representing a meaning of the first word and assigning to the second word a second vector representing a meaning of the second word. |
申请公布号 |
US2016371254(A1) |
申请公布日期 |
2016.12.22 |
申请号 |
US201615176114 |
申请日期 |
2016.06.07 |
申请人 |
Panasonic Intellectual Property Management Co., Ltd. |
发明人 |
YAMAGAMI KATSUYOSHI;USHIO TAKASHI;ISHII YASUNORI |
分类号 |
G06F17/27;G06N3/08;G06N3/04;G06F19/00 |
主分类号 |
G06F17/27 |
代理机构 |
|
代理人 |
|
主权项 |
1. A method for generating semantic information, comprising:
acquiring a first text corpus, including first text data of a first sentence including a first word and described in a natural language, and second text data of a second sentence including a second word different in meaning from the first word, with a second word distribution indicating types and frequencies of words appearing within a predetermined range prior to and subsequent to the second word being similar to a first word distribution within the predetermined range prior to and subsequent to the first word in the first sentence; acquiring a second text corpus including third text data of a third sentence, including a third word identical to at least one of the first word and the second word, with a third word distribution within the predetermined range prior to and subsequent to the third word being not similar to the first word distribution; in accordance with an arrangement of a word string in the first text corpus and the second text corpus, performing a learning process by assigning to the first word a first vector representing a meaning of the first word in a vector space of predetermined dimensions and by assigning to the second word a second vector representing a meaning of the second word in the vector space; and storing the first vector in association with the first word, and the second vector spaced by a predetermined distance or longer from the first vector in the vector space in association with the second word. |
地址 |
Osaka JP |