发明名称 Video concept detection using multi-layer multi-instance learning
摘要 Visual concepts contained within a video clip are classified based upon a set of target concepts. The clip is segmented into shots and a multi-layer multi-instance (MLMI) structured metadata representation of each shot is constructed. A set of pre-generated trained models of the target concepts is validated using a set of training shots. An MLMI kernel is recursively generated which models the MLMI structured metadata representation of each shot by comparing prescribed pairs of shots. The MLMI kernel is subsequently utilized to generate a learned objective decision function which learns a classifier for determining if a particular shot (that is not in the set of training shots) contains instances of the target concepts. A regularization framework can also be utilized in conjunction with the MLMI kernel to generate modified learned objective decision functions. The regularization framework introduces explicit constraints which serve to maximize the precision of the classifier.
申请公布号 US8804005(B2) 申请公布日期 2014.08.12
申请号 US200812111202 申请日期 2008.04.29
申请人 Microsoft Corporation 发明人 Mei Tao;Hua Xian-Sheng;Li Shipeng;Gu Zhiwei
分类号 G06K9/62;G06K9/34 主分类号 G06K9/62
代理机构 代理人 Boelitz Carole;Minhas Micky
主权项 1. A computer-implemented process for performing video concept detection on a video clip based upon a prescribed set of target concepts, comprising: using a computer to perform the following process actions: segmenting the clip into a plurality of shots, wherein each shot comprises a series of consecutive frames that represent a distinctive coherent visual theme; constructing a multi-layer multi-instance (MLMI) structured metadata representation of each shot, comprising, a layer indicator l,a hierarchy of three layers, said hierarchy comprising,an uppermost shot layer, l=1, comprising the plurality of shots segmented from the clip,an intermediate key-frame sub-layer, l=2, contiguously beneath the shot layer, comprising one or more key-frames for each shot, wherein each key-frame comprises one or more of the target concepts, anda lowermost key-region sub-layer, l=3, contiguously beneath the key-frame sub-layer, comprising a set of filtered key-regions for each key-frame, wherein each filtered key-region comprises a particular target concept, anda rooted tree structure, comprising a connected acyclic directed graph of nodes, wherein each node comprises structured metadata of a certain granularity describing a particular visual concept, and the granularity of the metadata increases for each successive layer down the hierarchy; validating a set of pre-generated trained models of the target concepts using a set of training shots selected from the plurality of shots; recursively generating an MLMI kernel kMLMI( ) which models the MLMI structured metadata representation of each shot by comparing prescribed pairs of shots; utilizing a regularization framework in conjunction with kMLMI( ) to generate a modified learned objective decision function f( ) which learns a classifier for determining if a particular shot x, that is not in the set of training shots, comprises instances of the target concepts, wherein the regularization framework introduces explicit constraints which serve to restrict instance classification in the key-frame and key-region sub-layers, thus maximizing the precision of the classifier, wherein the explicit constraints introduced by the regularization framework comprise, a constraint A comprising a ground truth for the target concepts and instance classification labels for the plurality of shots in the shot layer, said constraint A serving to minimize instance classification errors for said shots, anda constraint B comprising the ground truth, instance classification labels for the key-frames in the key-frame sub-layer, and instance classification labels for the sets of filtered key-regions in the key-region sub-layer, said constraint B serving to minimize instance classification errors for said key-frames and said filtered key-regions.
地址 Redmond WA US