发明名称 |
INTELLIGENT IMAGE CAPTIONING |
摘要 |
Presented herein are embodiments of a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. In embodiments, it directly models the probability distribution of generating a word given a previous word or words and an image, and image captions are generated according to this distribution. In embodiments, the model comprises two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. In embodiments, these two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of an embodiment of model was validated on four benchmark datasets, and it outperformed the state-of-the-art methods. In embodiments, the m-RNN model may also be applied to retrieval tasks for retrieving images or captions. |
申请公布号 |
US2017098153(A1) |
申请公布日期 |
2017.04.06 |
申请号 |
US201615166177 |
申请日期 |
2016.05.26 |
申请人 |
Baidu USA, LLC |
发明人 |
Mao Junhua;Xu Wei;Yang Yi;Wang Jiang;Huang Zhiheng |
分类号 |
G06N3/04;G06N3/08 |
主分类号 |
G06N3/04 |
代理机构 |
|
代理人 |
|
主权项 |
1. A computer-implemented method for generating a sentence-level description of an image, the method comprising:
receiving an input image; inputting the input image into a multimodal recurrent neural network (m-RNN), the m-RNN comprising:
a convolution neural network layer component that generates an image representation of the input image;at least one word embedding component that encodes syntactic and semantic meaning of a word into a word representation;a recurrent layer component that maps a recurrent layer activation of a prior time frame into a same vector space as a word representation at a current time frame and combines them;a multimodal component that receives a first input from the recurrent layer component and a second input from the convolution neural network layer component and combines them; anda softmax layer component that uses an output of the multimodal component to generate a probability distribution of a next word in the sentence-level description. |
地址 |
Sunnyvale CA US |