{"id":190714,"date":"2017-05-02T23:03:58","date_gmt":"2017-05-03T03:03:58","guid":{"rendered":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/new-ai-tech-can-mimic-any-voice-scientific-american\/"},"modified":"2017-05-02T23:03:58","modified_gmt":"2017-05-03T03:03:58","slug":"new-ai-tech-can-mimic-any-voice-scientific-american","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/ai\/new-ai-tech-can-mimic-any-voice-scientific-american\/","title":{"rendered":"New AI Tech Can Mimic Any Voice &#8211; Scientific American"},"content":{"rendered":"<p><p>    Even the most natural-sounding computerized voiceswhether its    Apples Siri or Amazons Alexastill sound like, well,    computers. Montreal-based    start-up Lyrebird is looking to change that with an    artificially intelligent system that learns to mimic a persons    voice by analyzing speech recordings and the corresponding text    transcripts as well as identifying the relationships between    them. Introduced last week, Lyrebirds speech synthesis can    generate thousands of sentences per secondsignificantly faster    than existing methodsand mimic just about any voice, an    advancement that raises ethical questions about how the    technology might be used and misused.  <\/p>\n<p>    The ability to generate natural-sounding speech has long been a    core challenge for computer programs that transform text into    spoken words. Artificial intelligence (AI) personal assistants    such as Siri, Alexa, Microsofts Cortana and the Google    Assistant all use text-to-speech software to create a more    convenient interface with their users. Those systems work by    cobbling together words and phrases from prerecorded files of    one particular voice. Switching to a different voicesuch as    having Alexa sound like a manrequires a new audio file    containing every possible word the device might need to    communicate with users.  <\/p>\n<p>    Lyrebirds system can learn the pronunciations of characters,    phonemes and words in any voice by listening to hours of spoken    audio. From there it can extrapolate to generate completely new    sentences and even add different intonations and emotions. Key    to Lyrebirds approach are artificial neural networkswhich use    algorithms designed to help them function like a human    brainthat rely on deep-learning techniques to transform bits    of sound into speech. A neural network takes in data and learns    patterns by strengthening connections between layered    neuronlike units.  <\/p>\n<p>    After learning how to generate speech the system can then adapt    to any voice based on only a one-minute sample of someones    speech. Different voices share a lot of information, says    Lyrebird co-founder Alexandre de Brbisson, a PhD student at    the Montreal Institute for Learning Algorithms laboratory at    the University of Montreal. After having learned several    speakers voices, learning a whole new speaker's voice is much    faster. Thats why we dont need so much data to learn a    completely new voice. More data will still definitely help, yet    one minute is enough to capture a lot of the voice DNA.  <\/p>\n<p>    Lyrebird showcased its system using the voices of U.S.    political figures Donald Trump, Barack Obama and Hillary    Clinton in a synthesized conversation about the start-up    itself. The company plans to sell the system to developers for    use in a wide range of applications, including personal AI    assistants, audio book narration and speech synthesis for    people with disabilities.  <\/p>\n<\/p>\n<p>    Last year Google-owned company DeepMind revealed its own    speech-synthesis system, called     WaveNet, which learns from listening to hours of raw audio    to generate sound waves similar to a human voice. It then can        read a text out loud with a humanlike voice. Both Lyrebird    and WaveNet use deep learning, but the underlying models are    different, de Brbisson says. Lyrebird is significantly faster    than WaveNet at generation time, he says. We can generate    thousands of sentences in one second, which is crucial for    real-time applications. Lyrebird also adds the possibility of    copying a voice very fast and is language-agnostic.    Scientific American reached out to DeepMind but was    told WaveNet team members were not available for comment.  <\/p>\n<p>    Lyrebirds speed comes with a trade-off, however. Timo    Baumann, a researcher who works on speech processing at the    Language Technologies Institute at Carnegie Mellon University    and is not involved in the start-up, noted Lyrebirds generated    voice carries a buzzing noise and a faint but noticeable    robotic sheen. Moreover, it does not generate breathing or    mouth movement sounds, which are common in natural speaking.    Sounds like lip smack and inbreathe are important in    conversation. They actually carry meaning and are observable to    the listener, Baumann says. These flaws make it possible to    distinguish the computer-generated speech from genuine speech,    he adds. We still have a few years before technology can get to    a point that it could copy a voice convincingly in real-time,    he adds.  <\/p>\n<p>    Still, to untrained ears and unsuspecting minds, an    AI-generated audio clip could seem genuine, creating ethical    and security concerns about impersonation. Such a technology    might also confuse and undermine voice-based verification    systems. Another concern is that it could render unusable voice    and video recordings used as evidence in court. A technology    that can be used to quickly manipulate audio will even call    into question the veracity of real-time video in live streams.    And in an era of fake news it can only compound existing    problems with identifying sources of information. It will    probably be still possible to find out when audio has been    tampered with, Baumann says, but Im not saying that    everybody will check.  <\/p>\n<p>    Systems equipped with a humanlike voice may also pose less    obvious but equally problematic risks. For example, users may    trust these systems more than they should, giving out personal    information or accepting purchasing advice from a device,    treating it like a friend rather than a product that belongs to    a company and serves its interests. Compared to text, voice is    just much more natural and intimate to us, Baumann says.  <\/p>\n<p>    Lyrebird acknowledges these concerns and essentially issues a    warning in the brief ethics statement on the    companys Web site. Lyrebird cautions the public that the    software could be used to manipulate audio recordings used as    evidence in court or to assume someone elses identity. We    hope that everyone will soon be aware that such technology    exists and that copying the voice of someone else is possible,    according to the site.  <\/p>\n<p>    Just as people have learned photographs cannot be fully    trusted in the age of Photoshop, they may need to get used to    the idea that speech can be faked. There is currently no way to    prevent the technology from being used to make fraudulent    audio, says     Bruce Schneier, a security technologist and lecturer in    public policy at the Kennedy School of Government at Harvard    University. The risk of encountering a fake audio clip has now    become the new reality, he says.  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>View original post here:<\/p>\n<p><a target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/www.scientificamerican.com\/article\/new-ai-tech-can-mimic-any-voice\/\" title=\"New AI Tech Can Mimic Any Voice - Scientific American\">New AI Tech Can Mimic Any Voice - Scientific American<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> Even the most natural-sounding computerized voiceswhether its Apples Siri or Amazons Alexastill sound like, well, computers.  <a href=\"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/ai\/new-ai-tech-can-mimic-any-voice-scientific-american\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":7,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[187743],"tags":[],"class_list":["post-190714","post","type-post","status-publish","format-standard","hentry","category-ai"],"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/190714"}],"collection":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/comments?post=190714"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/190714\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/media?parent=190714"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/categories?post=190714"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/tags?post=190714"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}