发明名称 Identifying key differences between related content from different mediums
摘要 System, method, and computer program product to identify differences between different media formats of a media title, by identifying at least one component of each of the different media formats of the media title, the at least one component comprising a unit of the media title, annotating a respective text transcription of each of the different media formats of the media title to include at least one attribute of the respective at least one component, computing a difference score for a first component of a first media format of the media title relative to each of the remaining different media formats of the media title, and upon determining that the difference score for the first component relative to a second media format of the media title exceeds a predefined threshold, creating an indication that the first component of the first media format is different from the second media format.
申请公布号 US9495365(B2) 申请公布日期 2016.11.15
申请号 US201313841838 申请日期 2013.03.15
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 Clark Adam T.;Kalmbach Michael T.;Petri John E.;Wendzel Kevin
分类号 G06F3/00;G06F3/048;G06F17/24;G06F17/30 主分类号 G06F3/00
代理机构 Patterson + Sheridan, LLP 代理人 Patterson + Sheridan, LLP
主权项 1. A system, comprising: one or more computer processors; and a memory containing a program, which, when executed by the one or more computer processors, performs an operation to identify differences between a plurality of different media formats of a single media title, the operation comprising: identifying at least one component of each of the plurality of different media formats of the media title, the plurality of different media formats of the media title including a video format;generating annotations for a respective text transcription of each of the plurality of different media formats of the media title, wherein the generated annotations comprise annotations that are not based on a dialogue of the respective media format of the media title, wherein the annotations that are not based on the dialogue comprises annotations describing a time of day of a component of each media format, wherein each annotation describes an attribute of the respective text transcription, wherein the annotation describing the time of day of a scene depicted in the component of the video format is generated based on a video data of the video format;generating a set of features describing each annotated text transcription of each media format, wherein the set of features comprises a respective concept present in each respective annotation, wherein a first concept present in each annotation comprises the time of day of the respective component of the media format;identifying a set of differences between a first component of a first media format of the media title relative to each of the remaining plurality of different media formats of the media title based on a comparison of the sets of features, and the respective concepts, of each media format, wherein the set of differences comprises a difference between the time of day of the first component and the time of day for at least one of the remaining plurality of different media formats;computing a difference score for the first component of the first media format of the media title relative to each of the remaining plurality of different media formats of the media title, wherein each difference score is based on: (i) the identified sets of differences between the first component of the media format and the respective different media formats, (ii) an English Slot Grammar (ESG) parser applied to each text transcription, and (iii) a Logical Form Answer Candidate Scorer (LFACS) applied to each text transcription; andupon determining that the difference score for the first component relative to a second media format of the media title exceeds a predefined threshold, creating an indication that the first component of the first media format is different from the second media format.
地址 Armonk NY US