摘要 |
A technique and system for counting the number of repetitions of approximately the same action in an input video sequence using 3D convolutional neural networks is disclosed. The proposed system runs online and not on the complete video. It analyzes sequentially blocks of 20 non-consecutive frames. The cycle length within each block is evaluated using a deep network architecture and the information is then integrated over time. A unique property of the disclosed method is that it is shown to successfully train on entirely synthetic data, created by synthesizing moving random patches. It therefore effectively exploits the high generalization capability of deep neural networks. Coupled with a region of interest detection mechanism and a suitable mechanism to identify the time scale of the video, the system is robust enough to handle real world videos collected from youtube and elsewhere, as well as non-video signals such as sensor data revealing repetitious physical movement. |