I am interested in audio artifacts of the native TTS of chat GPT.

I had EMBER voice read a long series of A's and a lot of artifacts poped up including musical notes, background noise and almost chanting polyphonies.

I figure they are related to training data but I am wondering if anyone here can direct me to any source that explains the mechanism behind the artifacts. Mostly because I am interested in replicating them.

Leave a Reply