发明名称 SYSTEMS AND METHODS FOR MULTIMEDIA IMAGE CLUSTERING
摘要 Computer image clustering systems and methods for conducting effective media searches by grouping multimedia documents tagged by keywords into a hierarchy of images configured to: (1) maintain a first database, (2) maintain an initial occurrence matrix, (3) maintain an occurrence matrix, (4) maintain a media file activation score for each media file in the first database, (5) generate a log version of the occurrence matrix, (6) maintain an inverse media file frequency value for each descriptive term in the first database, (7) generate a descriptive term frequency matrix and generate a list of document vectors in multidimensional space (list), and (8) organize and process each media file in the list into a high activation score category and a low activation score category.
申请公布号 US2015310010(A1) 申请公布日期 2015.10.29
申请号 US201514644301 申请日期 2015.03.11
申请人 Shutterstock, Inc. 发明人 Brenner Eliot;Lev-Tov Manor;Hohwald Heath;Xiong Maggie J.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A computer image clustering system for conducting an effective media search by grouping multimedia documents tagged by keywords into a hierarchy of images, the system comprising: a programmable data processor operating under the control of a program to convert the display commands into data entries in an array of multi-bit data characters and words, each entry of the array corresponding to a set of descriptions of the media file to be displayed; and a scanned-raster display device for generating illuminated points on a display surface in response to applied data signals causing the programmable data processor to perform the following operations: maintain a first database comprising a first set of records, each record comprising a media file and a set of descriptive terms associated with the media file; maintain an initial occurrence matrix comprising rows of the media files in the first database, and columns of the unique descriptive terms associated with media files in the first database, wherein a value at a given location (x, y) in the occurrence matrix indicates the number of times the unique descriptive term y appeared in the media file x; maintain an occurrence matrix comprising rows of the media files in the first database, and columns of the unique descriptive terms associated with media files in the first database, wherein a descriptive term media file activation (DTMA) value at a given location (x, y) in the occurrence matrix indicates the strength of association of descriptive term y for media file x; maintain a media file activation (MFA) score for a next media file by performing the following operations, until the MFA score is computed for each media file in the first database: identify the next descriptive term, and configure the MFA value of the next media file by adding the DTMA value at the occurrence matrix (the next media file, the next descriptive term) to the MFA value of the media file until each descriptive term in the occurrence matrix is processed; generate a log version of the occurrence matrix by computing a log of each of the DTMA values in the occurrence matrix; maintain an inverse media file frequency (IMFF) value for each descriptive term in the first database by generating a first value by aggregating the number of media files in which the descriptive term occurs in the first database, and taking a reciprocal of the first value; generate a descriptive term frequency matrix and implement dimensional reductions by using principal component analysis (PCA) of the descriptive term frequency matrix to generate a list of document vectors in multidimensional space (list); and organize and process each media file in the list into a high activation score category if the MFA score of the media file exceeds a predefined threshold and in a low activation score category if the MFA score of the media file is less than the predefined threshold.
地址 New York NY US