发明名称 Entity based temporal segmentation of video streams
摘要 A solution is provided for temporally segmenting a video based on analysis of entities identified in the video frames of the video. The video is decoded into multiple video frames and multiple video frames are selected for annotation. The annotation process identifies entities present in a sample video frame and each identified entity has a timestamp and confidence score indicating the likelihood that the entity is accurately identified. For each identified entity, a time series comprising of timestamps and corresponding confidence scores is generated and smoothed to reduce annotation noise. One or more segments containing an entity over the length of the video are obtained by detecting boundaries of the segments in the time series of the entity. From the individual temporal segmentation for each identified entity in the video, an overall temporal segmentation for the video is generated, where the overall temporal segmentation reflects the semantics of the video.
申请公布号 US9607224(B2) 申请公布日期 2017.03.28
申请号 US201514712071 申请日期 2015.05.14
申请人 Google Inc. 发明人 Tsai Min-hsuan;Vijayanarasimhan Sudheendra;Izo Tomas;Shetty Sanketh;Varadarajan Balakrishnan
分类号 G06K9/34;G06K9/00;H04N5/91;G06K9/66;G06K9/62 主分类号 G06K9/34
代理机构 Fenwick & West LLP 代理人 Fenwick & West LLP
主权项 1. A method for temporally segmenting a video, the method comprising: selecting sample video frames from a plurality of decoded video frames of the video; training an annotation model on a corpus of training images with a neural network model; annotating each of the selected sample video frames with the trained annotation model, wherein annotating a selected sample video frame comprises: applying the trained annotation model to each selected sample video frame;identifying one or more entities present in the selected sample video frame based on the application of the trained annotation model, an identified entity of the video representing an object of interest in the selected sample video frame;representing each identified entity by a set of annotation parameters; segmenting the selected sample video frames into a plurality of segments for each entity of the video based on the annotation of the selected sample video frames, a segment for an entity of the video representing a semantically meaningful spatial-temporal region of the video; and generating an overall temporal segmentation of the video based on the plurality of segments of each entity of the video.
地址 Mountain View CA US