摘要 |
A system and method are provided for analyzing a video. The method comprises: sampling the video to generate a plurality of spatio-temporal video volumes; clustering similar ones of the plurality of spatio-temporal video volumes to generate a low-level codebook of video volumes; analyzing the low-level codebook of video volumes to generate a plurality of ensembles of volumes surrounding pixels in the video; and clustering the plurality of ensembles of volumes by determining similarities between the ensembles of volumes, to generate at least one high-level codebook. Multiple high-level codebooks can be generated by repeating steps of the method. The method can further include performing visual event retrieval by using the at least one high- level codebook to make an inference from the video, for example comparing the video to a dataset and retrieving at least one similar video, activity and event labeling, and performing abnormal and normal event detection. |