INTELLIGENT IMAGE CAPTIONING,申请号US201615166177-传众专利搜索

发明名称	INTELLIGENT IMAGE CAPTIONING
摘要	Presented herein are embodiments of a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. In embodiments, it directly models the probability distribution of generating a word given a previous word or words and an image, and image captions are generated according to this distribution. In embodiments, the model comprises two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. In embodiments, these two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of an embodiment of model was validated on four benchmark datasets, and it outperformed the state-of-the-art methods. In embodiments, the m-RNN model may also be applied to retrieval tasks for retrieving images or captions.
申请公布号	US2017098153(A1)	申请公布日期	2017.04.06
申请号	US201615166177	申请日期	2016.05.26
申请人	Baidu USA, LLC	发明人	Mao Junhua;Xu Wei;Yang Yi;Wang Jiang;Huang Zhiheng
分类号	G06N3/04;G06N3/08	主分类号	G06N3/04
代理机构		代理人
主权项	1. A computer-implemented method for generating a sentence-level description of an image, the method comprising: receiving an input image; inputting the input image into a multimodal recurrent neural network (m-RNN), the m-RNN comprising: a convolution neural network layer component that generates an image representation of the input image;at least one word embedding component that encodes syntactic and semantic meaning of a word into a word representation;a recurrent layer component that maps a recurrent layer activation of a prior time frame into a same vector space as a word representation at a current time frame and combines them;a multimodal component that receives a first input from the recurrent layer component and a second input from the convolution neural network layer component and combines them; anda softmax layer component that uses an output of the multimodal component to generate a probability distribution of a next word in the sentence-level description.
地址	Sunnyvale CA US