发明名称 |
Visual speech detection using facial landmarks |
摘要 |
A data processing apparatus for detecting a probability of speech based on video data is disclosed. The data processing apparatus may include at least one processor, and a non-transitory computer-readable storage medium including instructions executable by the at least one processor, where execution of the instructions by the at least one processor causes the data processing apparatus to execute a visual speech detector. The visual speech detector may be configured to receive a coordinate-based signal. The coordinate-based signal may represent movement or lack of movement of at least one facial landmark of a person in a video signal. The visual speech detector may be configured to compute a probability of speech of the person based on the coordinate-based signal. |
申请公布号 |
US9190061(B1) |
申请公布日期 |
2015.11.17 |
申请号 |
US201313839655 |
申请日期 |
2013.03.15 |
申请人 |
Google Inc. |
发明人 |
Shemer Mikhal |
分类号 |
G10L15/25;G10L25/78;G06K9/78 |
主分类号 |
G10L15/25 |
代理机构 |
Brake Hughes Bellermann LLP |
代理人 |
Brake Hughes Bellermann LLP |
主权项 |
1. A data processing apparatus for detecting a probability of speech based on video data, the data processing apparatus comprising:
at least one processor; a non-transitory computer-readable storage medium including instructions executable by the at least one processor, wherein execution of the instructions by the at least one processor causes the data processing apparatus to execute: a visual speech detector configured to receive a coordinate-based signal, the coordinate-based signal representing movement or lack of movement of at least one facial landmark of a person in a video signal; the visual speech detector configured to calculate a short-term value representing short-term characteristics of the coordinated-based signal and a long-term value representing long-term characteristics of the coordinate-based signal, the visual speech detector configured to compute a probability of speech of the person based on a comparison of the short-term value and the long-term value, wherein, when the short-term value is greater than the long-term value, the visual speech detector computes the probability of speech as a value indicating that speech as occurred. |
地址 |
Mountain View CA US |