发明名称 Privacy-preserving text to image matching
摘要 A method for text-to-image matching includes generating representations of text images, such as license plate images, by embedding each text image into a first vectorial space with a first embedding function. With a second embedding function, a character string, such as a license plate number to be matched, is embedded into a second vectorial space to generate a character string representation. A compatibility is computed between the character string representation and one or more of the text image representations to identify a matching one. The compatibility is computed with a function that uses a transformation which is learned on a training set of labeled images. The learning uses a loss function that aggregates a text-to-image-loss and an image-to-text loss over the training set. The image-to-text loss penalizes the transformation when it correctly ranks a pair of character string representations, given an image representation corresponding to one of them.
申请公布号 US9367763(B1) 申请公布日期 2016.06.14
申请号 US201514594321 申请日期 2015.01.12
申请人 XEROX CORPORATION 发明人 Gordo Soldevila Albert;Perronnin Florent C.
分类号 G06K9/00;G06K9/62;G06K9/18;G06K9/52;G06K9/32;G06F17/30;G06F21/60;G06F17/11 主分类号 G06K9/00
代理机构 Fay Sharpe LLP 代理人 Fay Sharpe LLP
主权项 1. A method for text-to-image matching comprising: storing a set of text image representations, each text image representation having been generated by embedding a respective text image into a first vectorial space with a first embedding function; with a second embedding function, embedding a character string into a second vectorial space to generate a character string representation; for each of at least some of the text image representations, computing a compatibility between the character string representation and the text image representation, comprising computing a function of the text image representation, character string representation, and a transformation, the transformation having being derived by minimizing a loss function on a set of labeled training images, the loss function including a text-to-image-loss and an image-to-text loss; and identifying a matching text image based on the computed compatibilities, wherein at least one of the embedding and the computing of the compatibility is performed with a processor.
地址 Norwalk CT US