{"id":1126022,"date":"2024-06-13T16:37:43","date_gmt":"2024-06-13T20:37:43","guid":{"rendered":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/uncategorized\/exclusive-camb-takes-on-elevenlabs-with-open-voice-cloning-ai-model-mars5-offering-higher-realism-support-for-140-venturebeat\/"},"modified":"2024-06-13T16:37:43","modified_gmt":"2024-06-13T20:37:43","slug":"exclusive-camb-takes-on-elevenlabs-with-open-voice-cloning-ai-model-mars5-offering-higher-realism-support-for-140-venturebeat","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/cloning\/exclusive-camb-takes-on-elevenlabs-with-open-voice-cloning-ai-model-mars5-offering-higher-realism-support-for-140-venturebeat\/","title":{"rendered":"Exclusive: Camb takes on ElevenLabs with open voice cloning AI model Mars5 offering higher realism, support for 140 &#8230; &#8211; VentureBeat"},"content":{"rendered":"<p><p>      It's time to celebrate the incredible women      leading the way in AI! Nominate your inspiring leaders for      VentureBeats Women in AI Awards today before June 18.      Learn      More    <\/p>\n<p>    Today, Dubai-based Camb AI, a startup    researching AI-driven content localization technologies,    announced the release of Mars5, a powerful AI model for voice    cloning.  <\/p>\n<p>    While there are plenty of models that can create     digital voice replicas, including those from ElevenLabs,    Camb claims to differentiate by offering a much higher level of    realism with Mars5s outputs.  <\/p>\n<p>    According to early samples shared by the company, the model not    only emulates the original voice but also its complex prosodic    parameters, including rhythm, emotion and intonation.  <\/p>\n<p>    Camb also supports nearly 3 times as many languages as    ElevenLabs: more than 140 languages compared to ElevenLabs 36,    including low-resource ones like Icelandic and Swahili.    However, the open-sourced technology, which can be accessed on GitHub starting today, is    only the English-specific version. The version with expanded    language support is available on the companys paid Studio.  <\/p>\n<p>      VB Transform 2024 Registration is Open    <\/p>\n<p>      Join enterprise leaders in San Francisco from July 9 to 11      for our flagship AI event. Connect with peers, explore the      opportunities and challenges of Generative AI, and learn how      to integrate AI applications into your industry. Register      Now    <\/p>\n<p>    The level of prosody and realism that Mars5 is able to    capture, even with just a few seconds of input, is    unprecedented. This is a mistral moment in speech, Akshat    Prakash, the co-founder and CTO of the company, said in a    statement.  <\/p>\n<p>    Normally, voice cloning and text-to-speech conversion are two    separate offerings. The former captures parameters from a given    voice sample to create a voice clone while the latter uses that    clone to convert any given text into synthetic speech. The    technology, as we have seen in the past, has the potential to    portray anyone as speaking anything.  <\/p>\n<p>    With Mars5, Camb AI is taking the work ahead by mixing both    capabilities into a unified platform. All a user has to do is    upload an audio file, ranging between a few seconds to a    minute, and provide the text content. The model will then use    the speakers voice in the audio file as a reference, capture    the relevant details  including the original voice, speaking    style, emotion, enunciation and meaning  and synthesize the    provided text as speech using it.  <\/p>\n<p>    The company claims Mars5 can capture diverse emotional tones    and pitches, covering all sorts of complex speech scenarios    such as when a person is frustrated, commanding, calm or even    spirited. This, Prakash noted, makes it suitable for content    that has been traditionally difficult to convert into speech    such as sports commentary, movies, and anime.  <\/p>\n<p>    To achieve this level of prosody, Mars5 combines a Mistral-style    ~750M parameter autoregressive model with a novel ~450M    parameter non-autoregressive multinomial diffusion model,    operating on 6kbps encodec tokens.  <\/p>\n<p>    The AR model iteratively predicts the most coarse (lowest    level) codebook value for the encodec features, while the NAR    model takes the AR output and infers the remaining codebook    values in a discrete denoising diffusion task. Specifically,    the NAR model is trained as a DDPM using a multinomial    distribution on encodec features, effectively inpainting the    remaining codebook entries after the AR model has predicted the    coarse codebook values, Prakash explained.  <\/p>\n<p>    While specific benchmark stats are yet to be seen, early    samples and tests (with a few seconds of reference audio) run    by VentureBeat show that the model mostly performs better than    popular open and closed-source speech synthesis models,    including those from Metavoice and     ElevenLabs. The competitive offerings synthesized speech    clearly but the results didnt sound as similar to the original    voice as they did in the case of Mars5.  <\/p>\n<p>    ElevenLabs is closed source so its hard to specifically say    why they arent able to capture nuances that we can, but given    that they report training on 500K+ hours (almost 5 times the    dataset we have in English), it is clear to us that we have a    superior model design that learns speech and its nuances better    than theirs. Of course, as our datasets continue to grow and    Mars5 trains even more, which we will release in successive    checkpoints in Github, we expect it to only get better and    better and better, especially considering support from the    open-source community, the CTO added.  <\/p>\n<p>    As the company continues to bolster the voice cloning and    text-to-speech performance of Mars5, it is also planning the    open-source release of another model called Boli. This one has    been designed to enable translation with contextual    understanding, correct grammar and apt colloquialism.  <\/p>\n<p>    Boli is our proprietary translation model, which surpasses    traditional engines such as Google Translate and DeepL in    capturing the nuances and colloquial aspects of language.    Unlike large-scale parallel corpus-based systems, Boli offers a    more consistent and natural translation experience,    particularly in low- to medium-resource languages. Feedback    from clients indicates that Bolis translations outperform    those produced by mainstream tools, including the latest        generative models like ChatGPT, Prakash said.  <\/p>\n<p>    Currently, both Mars5 and Boli work with 140 languages on the    Cambs proprietary platform Camb Studio. The company is also    providing these capabilities as APIs to enterprises, SMEs and    developers. Prakash did not share the exact number of customers    but he did point out the company is working with Major League    Soccer, Tennis Australia, Maple Leaf Sports & Entertainment as    well as leading movie and music studios and several government    agencies.  <\/p>\n<p>    For Major League Soccer, Camb AI live-dubbed a game into four    languages in parallel for over 2 hours, uninterrupted     becoming the first company to do so. It also translated the    Australian Opens post-match conference into multiple languages    and translated the psychological thriller Three from Arabic    to Mandarin.  <\/p>\n<p>          VB Daily        <\/p>\n<p>          Stay in the know! Get the latest news in your inbox daily        <\/p>\n<p>          By subscribing, you agree to VentureBeat's Terms of Service.        <\/p>\n<p>          Thanks for subscribing. Check out more VB newsletters here.        <\/p>\n<p>          An error occured.        <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>Go here to read the rest: <\/p>\n<p><a target=\"_blank\" rel=\"nofollow noopener\" href=\"https:\/\/venturebeat.com\/ai\/exclusive-camb-takes-on-elevenlabs-with-open-voice-cloning-ai-model-mars5-offering-higher-realism-support-for-140-languages\/\" title=\"Exclusive: Camb takes on ElevenLabs with open voice cloning AI model Mars5 offering higher realism, support for 140 ... - VentureBeat\">Exclusive: Camb takes on ElevenLabs with open voice cloning AI model Mars5 offering higher realism, support for 140 ... - VentureBeat<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> It's time to celebrate the incredible women leading the way in AI! Nominate your inspiring leaders for VentureBeats Women in AI Awards today before June 18. Learn More Today, Dubai-based Camb AI, a startup researching AI-driven content localization technologies, announced the release of Mars5, a powerful AI model for voice cloning. While there are plenty of models that can create digital voice replicas, including those from ElevenLabs, Camb claims to differentiate by offering a much higher level of realism with Mars5s outputs <a href=\"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/cloning\/exclusive-camb-takes-on-elevenlabs-with-open-voice-cloning-ai-model-mars5-offering-higher-realism-support-for-140-venturebeat\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[187749],"tags":[],"class_list":["post-1126022","post","type-post","status-publish","format-standard","hentry","category-cloning"],"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/1126022"}],"collection":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/comments?post=1126022"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/1126022\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/media?parent=1126022"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/categories?post=1126022"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/tags?post=1126022"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}