



Meta itself also unveiled a large, unsupervised generative AI model for voice generation. More Emotional AI Voices: Meta and Google led the way Suno AI plans to offer its own generative audio AI models in the future and has started a waiting list. The demo cannot be used commercially, and Bark also requires Transformer language models with more than 100 billion parameters. The Bark team is making a demo version of their software available for free on Github. For compression, the team uses Meta's powerful AI audio compression method Encodec. This allows Bark to generalize beyond spoken language to other sounds or music that appear in the training data.Ī second model then converts these semantic tokens into audio codec tokens to generate the full waveform. Unlike Microsoft's VALL-E, which the Bark team cites as an inspiration along with AudioLM, Bark avoids the use of abstracted speech sounds, known as phonemes, and instead embeds text prompts directly into higher-level semantic tokens. One untrained feature: similar to the impressive elevenlabs voice AI, an English voice speaks German text with an English accent.Ĭheck your inbox or spam folder to confirm your subscription.

Suno AI says that the English voice output sounds the best, but that voices in other languages should sound better with further scaling. But it can't !īark currently supports 13 languages, including English, German, Spanish, French, Japanese, and Hindi. The AI voice quality of Bark isn't the best, but you can enter funny sound effects like, or even ♪ singing a Song about AGI ♪. In my initial tests, the instructions were not entirely reliable. Suno AI lists a number of such sound instructions, but says it finds new ones every day. The generation of sounds within a speech is flexible, using instructions in the text prompt to the voice model, such as or. Suno AI's Bark generative AI audio model can generate sounds in addition to voices in many languages.
