Long-query retrieval,申请号US201213692922-传众专利搜索

发明名称	Long-query retrieval
摘要	Described herein is a technology that facilitates efficient large-scale similarity-based retrieval. In several embodiments documents, images, and/or other multimedia files are compactly represented and efficiently indexed to enable robust search using a long-query in a large-scale corpus. As described herein, these techniques include performing decomposition of a file, e.g., an image, a document containing an image, or a document-like representation of an image. The techniques use dimension reduction to obtain three parts, low-dimensional representations (major semantics), file specific terms (minor semantics), and background words, representing the major semantics in a feature vector and the minor semantics as keywords. Using the techniques described, file vectors are matched in a topic model and the results ranked based on the keywords.
申请公布号	US9460122(B2)	申请公布日期	2016.10.04
申请号	US201213692922	申请日期	2012.12.03
申请人	Microsoft Technology Licensing, LLC	发明人	Li Zhiwei;Zhang Lei;Cai Rui;Ma Wei-Ying;Shum Heung-Yeung
分类号	G06F17/30	主分类号	G06F17/30
代理机构	Lee and Hayes, PLLC	代理人	Swain Sandy;Minhas Micky;Lee and Hayes, PLLC
主权项	1. A computer comprising: a processor coupled to a computer memory, the computer memory having computer executable instructions encoded thereon, the computer executable instructions when executed by the processor configure the computer to perform operations comprising: identifying an image file on which to base a query; performing decomposition comprising utilizing a probabilistic topic model extrinsic to the image file to obtain a decomposed version of the image file, the decomposed version of the image file comprising: low-dimensional representations; andimage-specific terms; creating a composite representation of the image file, the composite representation comprising: a vector of the low-dimensional representations; andan index of image-specific terms; based at least on the vector of low-dimensional representations, determining a topic corresponding to the image file; comparing the vector of the low-dimensional representations corresponding to the image file to vectors of low-dimensional representations corresponding to a plurality of files; and selecting a candidate set of files from the plurality of files based at least on a ranking of closeness of proximity of the vectors of the low-dimensional representations corresponding to the candidate set of files to the vector of the low-dimensional representations corresponding to the image file.
地址	Redmond WA US