摘要 |
Systems and methods for generating alerts and enhanced viewing experience features using on-screen data are disclosed. Textual data corresponding to on-screen text is determined from the visual content of video data. The textual data is associated with corresponding regions and frames of the video data in which the corresponding on-screen text was detected. Users can select regions in the frames of the visual content to monitor for a particular triggering item (e.g., a triggering word, name, or phrase). During play back of the video data, the textual data associated with the selected regions in the frames can be monitored for the triggering item. When the triggering item is detected in the textual data, an alert can be generated. Alternatively, the textual data for the selected region can be extracted to compile supplemental information that can be rendered over the playback of the video data or over other video data. |