发明名称 Long-query retrieval
摘要 Described herein is a technology that facilitates efficient large-scale similarity-based retrieval. In several embodiments documents, images, and/or other multimedia files are compactly represented and efficiently indexed to enable robust search using a long-query in a large-scale corpus. As described herein, these techniques include performing decomposition of a file, e.g., an image, a document containing an image, or a document-like representation of an image. The techniques use dimension reduction to obtain three parts, low-dimensional representations (major semantics), file specific terms (minor semantics), and background words, representing the major semantics in a feature vector and the minor semantics as keywords. Using the techniques described, file vectors are matched in a topic model and the results ranked based on the keywords.
申请公布号 US9460122(B2) 申请公布日期 2016.10.04
申请号 US201213692922 申请日期 2012.12.03
申请人 Microsoft Technology Licensing, LLC 发明人 Li Zhiwei;Zhang Lei;Cai Rui;Ma Wei-Ying;Shum Heung-Yeung
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Lee and Hayes, PLLC 代理人 Swain Sandy;Minhas Micky;Lee and Hayes, PLLC
主权项 1. A computer comprising: a processor coupled to a computer memory, the computer memory having computer executable instructions encoded thereon, the computer executable instructions when executed by the processor configure the computer to perform operations comprising: identifying an image file on which to base a query; performing decomposition comprising utilizing a probabilistic topic model extrinsic to the image file to obtain a decomposed version of the image file, the decomposed version of the image file comprising: low-dimensional representations; andimage-specific terms; creating a composite representation of the image file, the composite representation comprising: a vector of the low-dimensional representations; andan index of image-specific terms; based at least on the vector of low-dimensional representations, determining a topic corresponding to the image file; comparing the vector of the low-dimensional representations corresponding to the image file to vectors of low-dimensional representations corresponding to a plurality of files; and selecting a candidate set of files from the plurality of files based at least on a ranking of closeness of proximity of the vectors of the low-dimensional representations corresponding to the candidate set of files to the vector of the low-dimensional representations corresponding to the image file.
地址 Redmond WA US