摘要 |
<p>The text-to-speech audio HIP technique described herein in some embodiments uses different correlated or uncorrelated words or sentences generated via a text-to-speech engine as audio HIP challenges. The technique can apply different effects in the text-to-speech synthesizer speaking a sentence to be used as a HIP challenge string. The different effects can include, for example, spectral frequency warping; vowel duration warping; background addition; echo addition; and varying the time duration between words, among others. In some embodiments the technique varies the set of parameters to prevent using Automated Speech Recognition tools from using previously used audio HIP challenges to learn a model which can then be used to recognize future audio HIP challenges generated by the technique. Additionally, in some embodiments the technique introduces the requirement of semantic understanding in HIP challenges.</p> |