发明名称 Catch-up video buffering
摘要 A system determines if someone watching a live video feed looks or moves away from a display screen, and when their attention is back on the display, provides an accelerated recap of the content that they missed. The video component of the feed may be shown as a series of selected still images or clips from the original feed, while audio and/or text captioning is output at an accelerated rate. The rate may be adaptively adjusted to maintain a consistent speed, and superfluous content may be omitted. When the recap catches up to the live feed, output returns to regular speed.
申请公布号 US9462230(B1) 申请公布日期 2016.10.04
申请号 US201414230047 申请日期 2014.03.31
申请人 Amazon Technologies 发明人 Agrawal Amit Kumar;Gray Timothy Thomas;Tyagi Ambrish
分类号 H04N7/14;H04N7/15;H04N5/783 主分类号 H04N7/14
代理机构 Seyfarth Shaw LLP 代理人 Seyfarth Shaw LLP ;Barzilay Ilan N.;Klein David A.
主权项 1. A method, comprising: receiving audio-visual (AV) data as part of a video conference call; detecting a face of a participant of the video conference call using an imaging device associated with a device of the participant; determining that the participant's face is oriented in a direction toward a display associated with the device of the participant by applying image processing to a first image captured by a camera; outputting live content from the video conference call to the display at substantially a same time as the first content is received based on determining that the participant's face is oriented in the direction toward the display; determining, at a first time, that the participant is no longer observing the conference call based on image processing of a second image captured by the camera failing to detect that the participant is facing the display; determining, at a second time, that the participant is again observing the conference call by determining that the participant is facing the display, based on image processing of a third image captured by the camera, wherein the second time is after the first time; storing content from the AV data after the first time; performing speech-recognition processing on an audio portion of the AV data; identifying one or more sections of the audio portion of the AV data, the one or more sections comprising one or more of: silences, pauses, spoken filler words, non-lexical utterances, and false starts; outputting stored content to the display, wherein the outputting occurs after the second time and wherein the stored content is output at an accelerated rate until the stored content reaches the live content at a third time, and wherein the one or more sections are omitted when the stored content is output at the accelerated rate; and outputting the live content at a normal rate after the third time.
地址 Seattle WA US