<p>A method and apparatus for generating a voice tag (140) includes a means (110) for combining (205) a plurality of utterances (106, 107, 108) into a combined utterance (111) and a means (120) for extraction (210) of the voice tag as a sequence of phonemes having a high likelihood of representing the combined utterance, using a set of stored phonemes (115) and the combined utterance.</p>