发明名称 |
Long-query retrieval |
摘要 |
Described herein is a technology that facilitates efficient large-scale similarity-based retrieval. In several embodiments documents, images, and/or other multimedia files are compactly represented and efficiently indexed to enable robust search using a long-query in a large-scale corpus. As described herein, these techniques include performing decomposition of a file, e.g., an image, a document containing an image, or a document-like representation of an image. The techniques use dimension reduction to obtain three parts, low-dimensional representations (major semantics), file specific terms (minor semantics), and background words, representing the major semantics in a feature vector and the minor semantics as keywords. Using the techniques described, file vectors are matched in a topic model and the results ranked based on the keywords. |
申请公布号 |
US9460122(B2) |
申请公布日期 |
2016.10.04 |
申请号 |
US201213692922 |
申请日期 |
2012.12.03 |
申请人 |
Microsoft Technology Licensing, LLC |
发明人 |
Li Zhiwei;Zhang Lei;Cai Rui;Ma Wei-Ying;Shum Heung-Yeung |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
Lee and Hayes, PLLC |
代理人 |
Swain Sandy;Minhas Micky;Lee and Hayes, PLLC |
主权项 |
1. A computer comprising:
a processor coupled to a computer memory, the computer memory having computer executable instructions encoded thereon, the computer executable instructions when executed by the processor configure the computer to perform operations comprising: identifying an image file on which to base a query; performing decomposition comprising utilizing a probabilistic topic model extrinsic to the image file to obtain a decomposed version of the image file, the decomposed version of the image file comprising:
low-dimensional representations; andimage-specific terms; creating a composite representation of the image file, the composite representation comprising:
a vector of the low-dimensional representations; andan index of image-specific terms; based at least on the vector of low-dimensional representations, determining a topic corresponding to the image file; comparing the vector of the low-dimensional representations corresponding to the image file to vectors of low-dimensional representations corresponding to a plurality of files; and selecting a candidate set of files from the plurality of files based at least on a ranking of closeness of proximity of the vectors of the low-dimensional representations corresponding to the candidate set of files to the vector of the low-dimensional representations corresponding to the image file. |
地址 |
Redmond WA US |