发明名称 Delay in video for language translation
摘要 Disclosed are various embodiments for translation of speech in a video messaging application. A segment of streaming video is decoded to separate the visual component from the audio component. The audio component is then converted to text, which may then be translated and converted to a translation output comprising a new language. In response, the translation output may be encoded with the previously separated visual component. A delay is imposed on the visual component to account for any delays that may arise in translation. The translated video may then be streamed to participants giving the appearance of real-time video conferencing.
申请公布号 US8874429(B1) 申请公布日期 2014.10.28
申请号 US201213475139 申请日期 2012.05.18
申请人 Amazon Technologies, Inc. 发明人 Crosley Jay A.
分类号 G06F17/20;H04N7/14 主分类号 G06F17/20
代理机构 Thomas | Horstemeyer, LLP 代理人 Thomas | Horstemeyer, LLP
主权项 1. A non-transitory computer-readable medium embodying a program executable in a computing device, the program comprising: code that generates a data stream that comprises an audio signal embodying a first language and a video signal; code that transmits a first signal to a receiving computing device to turn on a first indicator that indicates an impending data stream; code that translates the audio signal embodying the first language into a translation output embodying a second language; code that adjusts a play speed of the translation output according to a duration of a video component of the video signal; code that imposes a delay in the video signal, the delay depending at least in part upon a time needed to perform a translation; code that associates the translation output with the video signal; code that determines whether a predefined amount of the audio signal embodying the first language has been translated to the translation output embodying the second language before a transmission to avoid a discontinuity in a plurality of speech segments; code that transmits a second signal to the receiving computing device to update a second indicator that displays an estimated accuracy of the translation; and code that transmits the video signal with the translation output to the receiving computing device responsive to the predefined amount of the audio signal embodying the first language being translated.
地址 Seattle WA US