发明名称 Detection of cast members in video content
摘要 Disclosed are various embodiments for detection of cast members in video content such as movies, television shows, and other programs. Data indicating cast members who appear in a video program is obtained. Each cast member is associated with a reference image depicting a face of the cast member. A frame is obtained from the video program, and a face is detected in the frame. The detected face in the frame is recognized as being a particular cast member based at least in part on the reference image depicting the cast member. An association between the cast member and the frame is generated in response to the detected face in the frame being recognized as the cast member.
申请公布号 US9449216(B1) 申请公布日期 2016.09.20
申请号 US201313860347 申请日期 2013.04.10
申请人 Amazon Technologies, Inc. 发明人 Dhua Arnab Sanat Kumar;Bhargava Gautam;Gray Douglas Ryan;Ramesh Sunil;Taylor Colin Jon
分类号 G06K9/00;G06F17/30 主分类号 G06K9/00
代理机构 Thomas | Horstemeyer, LLP 代理人 Thomas | Horstemeyer, LLP
主权项 1. A non-transitory computer-readable medium embodying a program that, when executed by at least one computing device, causes the at least one computing device to at least: obtain data indicating a plurality of cast members known to appear in a video program; subsequent to obtaining the data indicating the plurality of cast members known to appear in the video program, sample a first frame from the video program; detect a face in the first frame; recognize the detected face in the first frame as being one of the plurality of cast members known to appear in the video program based at least in part on a plurality of reference images corresponding to the plurality of cast members, wherein recognizing the detected face in the first frame is restricted to a set of faces recognized within a predefined time period in response to determining that a quantity of detected faces in the first frame meets a maximum threshold; generate a first association between the one of the plurality of cast members and the first frame when the detected face is recognized as being the one of the plurality of cast members; and generate a second association between the one of the plurality of cast members and a second frame of the video program based at least in part on the first association, a third association between the one of the plurality of cast members and a third frame of the video program, and a temporal smoothing factor.
地址 Seattle WA US