摘要 |
<p>Aspects relate to machine recognition of human voices in live or recorded audio content, and delivering text derived from such live or recorded content as real time text, with contextual information derived from characteristics of the audio. For example, volume information can be encoded as larger and smaller font sizes. Speaker changes can be detected and indicated through text additions, or color changes to the font. A variety of other context information can be detected and encoded in graphical rendition commands available through RTT, or by extending the information provided with RTT packets, and processing that extended information accordingly for modifying the display of the RTT text content.</p> |