Weirdly, I just tried it again and it seems to understand the difference between record and record just fine. Perhaps if there's heavy demand for voice chat, like after a new release, they load shed by using TTS to a smaller model.
However, It still doesn't seem capable of producing any of the sounds, like laughter, that I would expect from a native voice model.
However, It still doesn't seem capable of producing any of the sounds, like laughter, that I would expect from a native voice model.