发明名称 SEGMENTING OBJECTS IN MULTIMEDIA DATA
摘要 Disclosed is a method for segmenting a plurality of objects from a two-dimensional (2D) video captured through a depth camera and an RGB/G camera. The method comprises detecting camera motion in each 2D frame of the plurality of 2D frames from the 2D video and generate a first set of 2D frames without any camera motion. The method further comprises generating a plurality of cloud points for the first set of 2D frames corresponding to each pixel associated a 2D frames in the first set of 2D frames. The method further comprises generating a 3D grid comprising a plurality of voxels. The method further comprises determining valid voxels and an invalid voxels in the 3D grid. Further, a 3D connected component labeling technique is applied on to the set of valid voxels to segment the plurality of objects in the 2D video.
申请公布号 US2016035124(A1) 申请公布日期 2016.02.04
申请号 US201414767161 申请日期 2014.01.22
申请人 TATA CONSULTANCY SERVICES LIMITED 发明人 SINHA Aniruddha;CHATTOPADHYAY Tanushyam;ROY Sangheeta;MALLIK Apurbaa
分类号 G06T15/08;G06T7/20;G06T7/00 主分类号 G06T15/08
代理机构 代理人
主权项 1. A method for segmenting a plurality of objects present in a two-dimensional (2D) video having a plurality of 2D frames and depth information, the method comprising: receiving, by a processor, the 2D video and the depth information corresponding to pixels of the 2D frames in the 2D video; detecting, by the processor, camera motion in each 2D frame of the plurality of 2D frames of the 2D video; segregating, by the processor, the plurality of 2D frames into a first set of 2D frames and a second set of 2D frames based upon the detection of the camera motion in the each frame, wherein the first set of 2D frames is detected to be void of the camera motion, and wherein the second set of 2D frames is detected to have the camera motion therein; determining, by the processor, a plurality of cloud points in each 2D frame of the first set of 2D frames and depth data, wherein each cloud point of the plurality of cloud points stores x, y, z co-ordinates data and color data associated with each pixel of each 2D frame of the first set of 2D frames; converting, by the processor, each 2D frame and depth data of the first set of 2D frames into a 3D grid, wherein the 3D grid comprises a plurality of voxels, and wherein the 3D grid is indicative of a division of a 3D space, associated with each frame, by a plurality of equidistant planes perpendicular to an x-axis, a y-axis and a z-axis, and wherein each voxel has a definite volume in the 3D grid, and wherein each voxel accommodates one or more cloud points, and wherein each voxel being indicative of the definite volume is surrounded by three pairs of consecutive planes along the x-axis, the y-axis and the z-axis; determining, by the processor, valid voxels and invalid voxels from the plurality of voxels based upon a number of cloud points present in each voxel of the plurality of voxels; classifying, by the processor, each voxel of the plurality of the voxels into a first set of voxels and a second set of voxels, wherein the first set of voxels are valid voxels, and wherein the second set of voxels are invalid voxels; and labeling, by the processor, each voxel in the first set of voxels using a 3D connected component labeling technique in order to segment the plurality of objects present in the 2D video and depth data.
地址 Mumbai, Maharshtra IN