发明名称 Text detection in video
摘要 Techniques of detecting text in video are disclosed. In some embodiments, a portion of video content can be identified as having text. Text within the identified portion of the video content can be identified. A category for the identified text can be determined. In some embodiments, a determination is made as to whether the video content satisfies at least one predetermined condition, and the portion of video content is identified as having text in response to a determination that the video content satisfies the predetermined condition(s). In some embodiments, the predetermined condition(s) comprises at least one of a minimum level of clarity, a minimum level of contrast, and a minimum level of content stability across multiple frames. In some embodiments, additional information corresponding to the video content is determined based on the identified text and the determined category.
申请公布号 US9036083(B1) 申请公布日期 2015.05.19
申请号 US201414289142 申请日期 2014.05.28
申请人 Gracenote, Inc. 发明人 Zhu Irene;Harron Wilson;Cremer Markus K.
分类号 H04N7/00;H04N11/00;G06K9/72;H04N5/445 主分类号 H04N7/00
代理机构 Schwegman Lundberg & Woessner, P.A. 代理人 Schwegman Lundberg & Woessner, P.A.
主权项 1. A computer-implemented method comprising: identifying, by a machine having a memory and at least one processor, a portion of video content as having text, the identifying comprising: converting a frame of the video content to grayscale;performing edge detection on the frame;performing dilation on the frame to connect vertical edges within the frame;binarizing the frame;performing a connected component analysis on the frame to detect connected components within the frame;merging the connected components into a plurality of text lines;refining the plurality of text lines using horizontal and vertical projections;filtering out at least one of the plurality of text lines based on a size of the at least one of the plurality of text lines to form a filtered set of text lines;binarizing the filtered set of text lines; andfiltering out at least one of the text lines from the binarized filtered set of text lines based on at least one of a shape of components in the at least one of the text lines and a position of components in the at least one of the text lines to form the portion of the video content having text; identifying text within the identified portion of the video content; and determining a category for the identified text.
地址 Emeryville CA US