发明名称 Context-aware tracking of a video object using a sparse representation framework
摘要 A method, system, and/or computer program product tracks an object in a video. A bounding box is defined by the user in a first frame, thus representing the object to be tracked based on a point of interest. A static dictionary D is populated with the densely overlapping patches from a search window. A new frame in the video is detected, and candidate patches, in the new frame, that potentially depict the object being tracked are identified. The candidate patches are co-located with the multiple densely overlapping patches to form a dynamic candidate dictionary Y of candidate patches. Candidate patches that best match the densely overlapping patches from the first frame are identified by an L1-norm solution, in order to identify a best-matched patch in the new frame.
申请公布号 US9213899(B2) 申请公布日期 2015.12.15
申请号 US201414223557 申请日期 2014.03.24
申请人 International Business Machines Corporation 发明人 Mj Ashwini
分类号 G06K9/00;G06T11/60;G06K9/62 主分类号 G06K9/00
代理机构 Law Office of Jim Boice 代理人 Pivnichny John R.;Law Office of Jim Boice
主权项 1. A method to track an object in a video, the method comprising: initializing, by one or more processors, a first frame in a video by detecting a search window over an object to be tracked, wherein initializing the first frame comprises defining multiple densely overlapping patches within the search window; populating, by one or more processors, a static dictionary D with the densely overlapping patches from the search window; detecting, by one or more processors, a new frame in the video, wherein the new frame includes the object being tracked; identifying, by one or more processors, candidate patches, in the new frame, that potentially depict the object being tracked; co-locating, by one or more processors, the candidate patches with the multiple densely overlapping patches to form a dynamic candidate dictionary Y of candidate patches; identifying, by one or more processors, candidate patches that best match the densely overlapping patches from the first frame to generate selected candidate patches by minimizing a solution: min∥Dαk−yk∥22+λ∥αk∥1 where the solution minimizes a square of a L2-norm for a distance between atoms in a dictionary D of the densely overlapping patches times an n-dimensional coefficient vector αk (Dαk) and each candidate patch (yk) in dictionary Y, plus a Lagrange Multiplier lambda (λ) times an L1-norm of αk, wherein the Lagrange multiplier is determined by the gradient between an initial atom dk from D and the candidate atom yk from Y; weighting, by one or more processors, the selected candidate patches based on a sparse coefficient of confidence of the selected candidate patches belonging to the object being tracked; identifying, by one or more processors, a highest weighted candidate patch, from the selected candidate patches, as a patch that depicts the object being tracked in the new frame of the video; and constructing, by one or more processors, a confidence map for candidate patches in the dictionary Y, wherein the confidence map is a 2-D matrix that depicts a level of confidence that a patch from the new frame matches a densely overlapping patch from the first frame wherein the confidence map is based on: yx,y=αxyD′ [x,y]εOw where αxy is a ‘n’ dimensional coefficient vector for each candidate patch at location (x, y), where [x,y] are elements of (ε) and object window (Ow) that describe locations in an object window in the new frame, wherein D′=[DoDb], wherein Do is a dictionary of object patches from the object being tracked, wherein Db is a dictionary of background patches outside of the object being tracked, and wherein Do and Db are both used to discriminate the object patches from the background patches.
地址 Armonk NY US