发明名称 Visual speech detection using facial landmarks
摘要 A data processing apparatus for detecting a probability of speech based on video data is disclosed. The data processing apparatus may include at least one processor, and a non-transitory computer-readable storage medium including instructions executable by the at least one processor, where execution of the instructions by the at least one processor causes the data processing apparatus to execute a visual speech detector. The visual speech detector may be configured to receive a coordinate-based signal. The coordinate-based signal may represent movement or lack of movement of at least one facial landmark of a person in a video signal. The visual speech detector may be configured to compute a probability of speech of the person based on the coordinate-based signal.
申请公布号 US9190061(B1) 申请公布日期 2015.11.17
申请号 US201313839655 申请日期 2013.03.15
申请人 Google Inc. 发明人 Shemer Mikhal
分类号 G10L15/25;G10L25/78;G06K9/78 主分类号 G10L15/25
代理机构 Brake Hughes Bellermann LLP 代理人 Brake Hughes Bellermann LLP
主权项 1. A data processing apparatus for detecting a probability of speech based on video data, the data processing apparatus comprising: at least one processor; a non-transitory computer-readable storage medium including instructions executable by the at least one processor, wherein execution of the instructions by the at least one processor causes the data processing apparatus to execute: a visual speech detector configured to receive a coordinate-based signal, the coordinate-based signal representing movement or lack of movement of at least one facial landmark of a person in a video signal; the visual speech detector configured to calculate a short-term value representing short-term characteristics of the coordinated-based signal and a long-term value representing long-term characteristics of the coordinate-based signal, the visual speech detector configured to compute a probability of speech of the person based on a comparison of the short-term value and the long-term value, wherein, when the short-term value is greater than the long-term value, the visual speech detector computes the probability of speech as a value indicating that speech as occurred.
地址 Mountain View CA US