Todays voice assistants are still a far cry from the hyper-intelligent thinking machines weve been musing about for decades. And its because that technology is actually the combination of three different skills: speech recognition, natural language processing, and voice generation.
Each of these skills already presents huge challenges. In order to master just the natural language processing part? You pretty much have to re-create human-level intelligence. Deep learning, the technology driving the current AI boom, can train machines to become masters at all sorts of tasks. But it can only learn one at a time. And because most AI models train their skill set on thousands or millions of existing examples, they end up replicating patterns within historical dataincluding the many bad decisions people have made, like marginalizing people of color and women.
Still, systems like the board-game champion AlphaZero and the increasingly convincing fake-text generator GPT-3 have stoked the flames of debate regarding when humans will create an artificial general intelligencemachines that can multitask, think, and reason for themselves. In this episode, we explore how machines learn to communicateand what it means for the humans on the other end of the conversation.
This episode was produced by Jennifer Strong, Emma Cillekens, Anthony Green, Karen Hao, and Charlotte Jee. Were edited by Michael Reilly and Niall Firth.
[TR ID]
Jim: I don't know if it was AI If they had taken the recording of something he had done... and were able to manipulate it... but I'm telling you, it was my son.
Strong: The day started like any other for a man... were going to call Jim. He lives outside Boston.
And by the way... he has a family member who works for MIT.
Were not going to use his last name because they have concerns about their safety.
Jim: It was a Tuesday or Wednesday morning, nine o'clock I'm deep in thought working on something,
Strong: That is... until he received this call.
Jim: The phone rings and I pick it up and it's my son. And he is clearly agitated. This, this kid's a really chill guy but when he does get upset, he has a number of vocal mannerisms. And this was like, Oh my God, he's in trouble.
And he basically told me, look, I'm in jail, I'm in Mexico. They took my phone. I only have 30 seconds. Um, they said I was drinking, but I wasn't and people are hurt. And look, I have to get off the phone, call this lawyer and it gives me a phone number and has to hang up.
Strong: His son is in Mexico and theres just no doubt in his mind its him.
Jim: And I gotta tell you, Jennifer, it, it was him. It was his voice. It was everything. Tone. Just these little mannerisms, the, the pauses, the gulping for air, everything that you could imagine.
Strong: His heart is in his throat...
Jim: My hair standing on edge
Strong: So, he calls that phone number A man picks up and he offers more details on whats going on.
Jim: Your son is being charged with hitting this car. There was a pregnant woman driving whose arm was broken. Her daughter was in the back seat.. is in critical condition and they are, um, they booked him with driving under the influence. We don't think that he has done that. This is we've, we've come across this a number of times before, but the most important thing is to get him out of jail, get him safe, as fast as possible.
Strong: Then the conversation turns to money hes told bail has been set and he needs to put down ten percent.
Jim: So as soon as he started talking about money, you know, the, the flag kind of went up and I said, excuse me, is there any chance that this is a scam of some sort? And he got really kind of, um, irritated. He's like, Hey, you called me. Look, I find this really offensive that you're accusing me of something. And then my heart goes back in my throat. I'm like, this is the one guy who's between my son and even worse jail. So I backtracked
[Music]
My wife walks in 10 minutes later and says, well, you know, I was texting with him late last night. Like this is around the time probably that he would have been arrested and jailed. So, of course we text him, he's just getting up. He's completely fine.
Strong: Hes still not sure how someone captured the essence of his sons voice. But he has some theories...
Jim: They had to have gotten a recording of something when he was upset. That's the only thing that I can say, cause they couldn't have mocked up some of these things that he does. They couldn't guess at that. I don't think, and so they, I think they had certainly some raw material to work with and then what they did with it from there. I don't know.
Strong: And its not just Jim who's unsure We have no idea whether AI had anything to do with this.
But, the point is we now live in a world where we also cant be sure that it didnt.
Its incredibly easy to fake someones voice with even a few minutes of recordings and teenagers like Jims son? They share countless recordings through social media posts and messages.
Jim: I was quite impressed with how good it was. Um, like I said, I'm not easily fooled and man, they had it nailed. So, um, just caution.
Strong: Im Jennifer Strong and this episode we look at what it takes to make a voice.
[SHOW ID]
Zeyu Gin: You guys have been making weird stuff online.
Strong: Zeyu Jin is a research scientist at Adobe This is him speaking at a company conference about five years ago showing how software can rearrange the words in this recording.
Key: I jumped on the bed and I kissed my dogs and my wifein that order.
Zeyu: So how about we mess with who he actually kissed. // Introducing Project VoCo. Project VoCo allows you to edit speech in text. So lets bring it up. So I just load this audio piece in VoCo. So as you can see we have the audio waveform and we have the text under it. //
So what do we do? Copy paste. Oh! Yeah its done. Lets listen to it.
Key: And I kissed my wife and my dogs.
Zeyu: Wait theres more. We can actually type something thats not here.
Key: And I kissed Jordan and my dogs.
Strong: Adobe never released this prototype but the underlying technology keeps getting better.
For example, heres a computer-generated fake of podcaster Joe Rogan from 2019... It was produced by Squares AI lab called Dessa to raise awareness about the technology.
Rogan: 10-7 Friends, I've got something new to tell all of you. Ive decided to sponsor a hockey team made up entirely of chimps.
Strong: While it sounds like fun and games experts warn these artificial voices could make some types of scams a whole lot more common. Things like what we heard about earlier.
Mona Sedky: Communication focused crime has historically been lower on the totem pole.
Strong: Thats federal Prosecutor Mona Sedky speaking last year at the Federal Trade Commission about voice cloning technologies.
Mona Sedky: But now with the advent of things like deep fake video now deep fake audio you you can basically have anonymizing tools and be anywhere on the internet you want to be. anywhere in the world and communicate anonymously with people. So as a result there has been an enormous uptick in communication focused crime.
Balasubramaniyan: But imagine if you as a CFO or chief controller gets a phone call that comes from your CEOs phone number.
Strong: And this is Pindrop Security CEO Vijay Balasubramaniyan at a security conference last year.
Balasubramaniyan: Its completely spoofed so it actually uses your address book, and it shows up as your CEOs name... and then on the other end you hear your CEOs voice with a tremendous amount of urgency. And we are starting to see crazy attacks like that. There was an example that a lot of press media covered, which is a $220,000 wire that happened because a CEO of a UK firm thought he was talking to his parent company so he then sent that money out. But weve seen as high as $17 million dollars go out the door.
Strong: And the very idea of fake voices... can be just as damaging as a fake voice itself Like when former president Donald Trump tried to blame the technology for some offensive things he said that were caught on tape.
But like any other tech its not inherently good or bad its just a tool... and I used it in the trailer for season one to show what the technology can do.
Strong: If seeing is believing...
How do we navigate a world where we cant trust our eyes... or ears?
And so you know... what youre listening to... Its not just me speaking. I had some help from an artificial version of my voice filling in words here and there.
Meet synthetic Jennifer.
Synthetic Jennifer: Hi there, folks!
Strong: I can even click to adjust my mood
Synthetic Jennifer: Hi there.
Strong: Yeah, lets not make it angry..
Strong: In the not so distant future this tech will be used in any number of ways for simple tweaks to pre-recorded presentations even... to bring back the voices of animated characters from a series
In other words, artificial voices are here to stay. But they havent always been so easy to make and I called up an expert whose voice might sound familiar..
Bennett: How does this sound? Um, maybe I could be a little more friendly. How are you?
Hi, I'm Susan C. Bennet, the original voice of Siri.
Well, the day that Siri appeared, which was October 4, 2011, a fellow voice actor emailed me and said, Hey, we're playing around with this new iPhone app, isn't this you? And I said, what? I went on the Apple site and listened... and yep. That was my voice. [chuckles]
Strong: You heard that right. The original female voice that millions associate with Apple devices? Had no idea. And she wasnt alone. The human voices behind other early voice assistants were also taken by surprise.
Bennett: Yeah, it's been an interesting thing. It was an adjustment at first as you can imagine, because I wasn't expecting it. It was a little creepy at first, I'll have to say, I never really did a lot of talking to myself as Siri, but gradually I got accepting of it and actually it ended up turning into something really positive so
Strong: To be clear, Apple did not steal Susan Bennetts voice. For decades, shes done voice work for companies like McDonalds and Delta Airlines and years before Siri came out she did a strange series of recordings that fueled its development.
Bennett: In 2005, we couldn't have imagined something like Siri or Alexa. And so all of us, I've talked to other people who've had the same experience, who have been a virtual voice. You know we just thought we were doing just generic phone voice messaging. And so when suddenly Siri appeared in 2011, it's like, I'm who, what, what is this? So, it was a genuine surprise, but I like to think of it as we were just on the cutting edge of this new technology. So, you know, I choose to think of it as a very positive thing, even though, we, none of us, were ever paid for the millions and millions of phones that our voices are heard on. So that's, that's a downside.
Strong: Something else thats awkward... she says Apple never acknowledged her as the American voice of Siri thats despite becoming an accidental celebrity... reaching millions.
Bennett: The only actual acknowledgement that I've ever had is via Siri. If you ask Siri "Who is Susan Bennett?" she'll say, I'm the original voice of Siri. Thanks so much, Siri. Appreciate it.
Strong: But its not the first time shes given her voice to a machine.
Bennett: In the late 70s when they were introducing ATMs I like to say it was my first experience as a machine, and you know, there were no personal computers or anything at that time and people didn't trust machines. They wouldn't use the ATMs because they didn't trust the machines to give them the right money. They, you know, if they put money in the machine they were afraid they'd never see it again. And so a very enterprising advertising agency in Atlanta at the time called McDonald and Little decided to humanize the machine. So they wrote a jingle and I became the voice of Tilly the all-time teller and then they ultimately put a little face on the machine.
Strong: The human voice helps companies build trust with consumers...
Bennett: There are so many different emotions and meanings that we get across through the sound of our voices rather than just in print. That's why I think emojis came up because you can't get the nuances in there without the voice. And so I think that's why voice has become such an important part of technology.
Strong: And in her own experience, interactions with this synthetic version of her voice have led people to trust and confide in her to call her a friend, even though theyve never met her.
Bennett: Well, I think the oddest thing about being the voice of Siri, to me, is when I first revealed myself, it was astounding to me how many people considered Siri their friend or some sort of entity that they could really relate to. I think they actually in many cases think of her as human.
Strong: Its estimated the global market for voice technologies will reach nearly 185-billion dollars this year... and AI-generated voices? are a game changer.
Bennett: You know, after years and years of working on these voices, it's really, really hard to get the actual rhythm of the human voice. And I'm sure they'll probably do it at some point, but you will notice even to this day, you know, you'll listen to Siri or Alexa or one of the others and they'll be talking along and it sounds good until it doesn't. Like, Oh, I'm going to the store. You know, there's some weirdness in the rhythmic sense of it.
Strong: But even once human-like voices become commonplace... shes not entirely sure that will be a good thing.
Bennett: But you know, the advantage for them is they don't really have to get along with Siri. They can just tell Siri what to do if they don't like what she says, they can just turn it off. So it is not like real human relations. It's like maybe what people would like human relations to be. Everybody does what I want. (laughter) Then everybody's happy. Right?
Strong: Of course, voice assistants like Siri and Alexa arent just voices. Their capabilities come from the AI behind the scenes too.
Its been explored in science fiction films like this one, called Her about a man who falls in love with his voice assistant.
Theodore: How do you work?
Samantha (AI): Well... Basically I have intuition. I mean.. The DNA of who I am is based on the millions of personalities of all the programmers who wrote me, but what makes me me is my ability to grow through my experiences. So basically in every moment I'm evolving, just like you.
Strong: But todays voice assistants are a far cry from the hyper-intelligent thinking machines weve been musing about for decades.
And its because that technology... is actually many technologies. Its the combination of three different skills...speech recognition, natural language processing and voice generation.
Speech recognition is what allows Siri to recognize the sounds you make and transcribe them into words. Natural language processing turns those words into meaning... and figures out what to say in response. And voice generation is the final piece... the human element... that gives Siri the ability to speak.
Each of these skills is already a huge challenge... In order to master just the natural language processing part? You pretty much have to re-create human-level intelligence.
And were nowhere near that. But weve seen remarkable progress with the rise of deep learning helping Siri and Alexa be a little more useful.
Metz: What people may not know about Siri is that original technology was something different.
Strong: Cade Metz is a tech reporter for the New York Times. His new book is called Genius Makers: The Mavericks Who Brought AI to Google, Facebook, and the World.
Metz: The way that Siri was originally built... You had to have a team of engineers, in a room, at their computers and piece by piece, they had to define with computer code how it would recognize your voice.
Strong: Back then... engineers would spend days writing detailed rules meant to show machines how to recognize words and what they mean.
And this was done at the most basic level often working with just snippets of voice at a time.
Just imagine all the different ways people can say the word hello or all the ways we piece together sentences explaining why time flies or how some verbs can also be nouns.
Metz: You can never piece together everything you need, no matter how many engineers you have no matter how rich your company is. Defining every little thing that might happen when someone speaks into their iPhone You just don't have enough person-power to build everything you need to build. It's just too complicated.
Strong: Neural networks made that process a whole lot easier They simply learn by recognizing patterns in data fed into the system.
Metz: You take that human speech You give it to the neural network And the neural network learns the patterns that define human speech. That way it can recreate it without engineers having to define every little piece of it. The neural network literally learns the task on its own. And that's the key change... is that a neural network can learn to recognize what a cat looks like, as opposed to people having to define for the machine what a cat looks like.
Strong: But even before neural networks Tech companies like Microsoft aimed to build systems that could understand the everyday way people write and talk.
And in 1996, Microsoft hired a linguist Chris Brocket... to begin work on what they called natural-language AI.
Read the original here:
Podcast: AI finds its voice - MIT Technology Review
- Chinese national arrested and charged with stealing AI trade secrets from Google - NPR - March 8th, 2024 [March 8th, 2024]
- President Biden Calls for Ban on AI Voice Impersonations During State of the Union - Variety - March 8th, 2024 [March 8th, 2024]
- Revolutionize Your Business with AWS Generative AI Competency Partners | Amazon Web Services - AWS Blog - March 8th, 2024 [March 8th, 2024]
- Broadcom Expects AI Demand to Help Offset Weakness Elsewhere - Yahoo Finance - March 8th, 2024 [March 8th, 2024]
- Micron Hits Record High With Analysts Calling It an 'Under-Appreciated AI Beneficiary' - Investopedia - March 8th, 2024 [March 8th, 2024]
- The Adams administration quietly hired its first AI czar. Who is he? - City & State New York - March 8th, 2024 [March 8th, 2024]
- AI likely to increase energy use and accelerate climate misinformation report - The Guardian - March 8th, 2024 [March 8th, 2024]
- This Artificial Intelligence (AI) Stock Could Double, and It Is Way Cheaper Than Nvidia - Yahoo Finance - March 8th, 2024 [March 8th, 2024]
- Fake images made to show Trump with Black supporters highlight concerns around AI and elections - The Associated Press - March 8th, 2024 [March 8th, 2024]
- Artificial intelligence and illusions of understanding in scientific research - Nature.com - March 8th, 2024 [March 8th, 2024]
- Analysis | House AI task force leaders take long view on regulating the tools - The Washington Post - March 8th, 2024 [March 8th, 2024]
- Don't Give Your Business Data to AI Companies - Dark Reading - March 8th, 2024 [March 8th, 2024]
- NIST, the lab at the center of Bidens AI safety push, is decaying - The Washington Post - March 8th, 2024 [March 8th, 2024]
- Essay | AI is Coming! Tips for Staying Calm and Carrying On - The Wall Street Journal - March 8th, 2024 [March 8th, 2024]
- AI can be easily used to make fake election photos - report - BBC.com - March 8th, 2024 [March 8th, 2024]
- 5 Artificial Intelligence (AI) Stocks That Could Make You a Millionaire - Yahoo Finance - March 8th, 2024 [March 8th, 2024]
- AI could be an extraordinary force for good. So why do our politicians still not have a plan? - The Guardian - March 8th, 2024 [March 8th, 2024]
- Mapping Disease Trajectories from Birth to Death with AI - Neuroscience News - March 8th, 2024 [March 8th, 2024]
- India plans 10,000-GPU sovereign AI supercomputer - The Register - March 8th, 2024 [March 8th, 2024]
- SAP enhances Datasphere and SAC for AI-driven transformation - CIO - March 8th, 2024 [March 8th, 2024]
- Jim Cramer names companies and sectors poised to rally on the AI wave - CNBC - March 8th, 2024 [March 8th, 2024]
- The job applicants shut out by AI: The interviewer sounded like Siri - The Guardian - March 8th, 2024 [March 8th, 2024]
- Microsoft confirms Surface and Windows AI event for March 21st - The Verge - March 8th, 2024 [March 8th, 2024]
- Adobes new Express app brings Firefly AI tools to iOS and Android - The Verge - March 8th, 2024 [March 8th, 2024]
- A Google AI Watched 30,000 Hours of Video GamesNow It Makes Its Own - Singularity Hub - March 8th, 2024 [March 8th, 2024]
- Palantir CEO Karp on TITAN, AI Warfare Technology - Bloomberg - March 8th, 2024 [March 8th, 2024]
- Elliptic Curve Murmurations Found With AI Take Flight - Quanta Magazine - March 8th, 2024 [March 8th, 2024]
- 5 AI Stocks to Buy in March 2024, According to Analysts - TipRanks.com - TipRanks - March 8th, 2024 [March 8th, 2024]
- Wix's new AI chatbot builds websites in seconds based on prompts - The Verge - March 8th, 2024 [March 8th, 2024]
- Amid record high energy demand, America is running out of electricity - The Washington Post - March 8th, 2024 [March 8th, 2024]
- AI Crypto Tokens in 5 Minutes: What to Know and Where to Start - Inc. - February 26th, 2024 [February 26th, 2024]
- 'The Worlds I See' by AI visionary Fei-Fei Li '99 selected as Princeton Pre-read - Princeton University - February 26th, 2024 [February 26th, 2024]
- AI is having a 1995 moment, analyst says - Business Insider - February 26th, 2024 [February 26th, 2024]
- Vatican research group's book outlines AI's 'brave new world' - National Catholic Reporter - February 26th, 2024 [February 26th, 2024]
- Honor's Magic 6 Pro launches internationally with AI-powered eye tracking on the way - The Verge - February 26th, 2024 [February 26th, 2024]
- Google explains Gemini's embarrassing AI pictures of diverse Nazis - The Verge - February 26th, 2024 [February 26th, 2024]
- Google cut a deal with Reddit for AI training data - The Verge - February 26th, 2024 [February 26th, 2024]
- What's the point of Elon Musk's AI company? - The Verge - February 26th, 2024 [February 26th, 2024]
- AI agents like Rabbit aim to book your vacation and order your Uber - NPR - February 26th, 2024 [February 26th, 2024]
- Announcing Microsofts open automation framework to red team generative AI Systems - Microsoft - February 26th, 2024 [February 26th, 2024]
- After Nvidia's latest blowout, here are 20 AI stocks expected to rise as much as 44% - Yahoo Finance - February 26th, 2024 [February 26th, 2024]
- 1 Exceptional AI Chip Stock Investors Need to Know About in 2024 - The Motley Fool - February 26th, 2024 [February 26th, 2024]
- Nvidia briefly hits $2 trillion valuation as AI frenzy grips Wall Street - Reuters - February 26th, 2024 [February 26th, 2024]
- AI Chatbots Can Guess Your Personal Information From What You ... - WIRED - October 18th, 2023 [October 18th, 2023]
- Harvard IT Launches Pilot of AI Sandbox to Enable Walled-Off Use ... - Harvard Crimson - October 18th, 2023 [October 18th, 2023]
- Advancing policing through AI: Insights from the global law ... - Police News - October 18th, 2023 [October 18th, 2023]
- Hochul announces new SUNY, IBM investments in AI - Olean Times Herald - October 18th, 2023 [October 18th, 2023]
- Nvidia's banking on TensorRT to expand its generative AI dominance - The Verge - October 18th, 2023 [October 18th, 2023]
- AI expands from MRFs to vehicles - Plastics Recycling Update - October 18th, 2023 [October 18th, 2023]
- AI Reads Ancient Scroll Charred by Mount Vesuvius in Tech First - Scientific American - October 18th, 2023 [October 18th, 2023]
- A DEEPer (squared) dive into AI Harvard Gazette - Harvard Gazette - October 18th, 2023 [October 18th, 2023]
- Florida bar weighs whether lawyers using AI need client consent - Reuters - October 18th, 2023 [October 18th, 2023]
- Cognizant and Vianai Systems Announce Strategic Partnership to ... - PR Newswire - October 18th, 2023 [October 18th, 2023]
- How AI could speed up scientific discoveries, from proteins to ... - NPR - October 18th, 2023 [October 18th, 2023]
- AI challenge to deliver better healthcare | Western Australian ... - Government of Western Australia - October 18th, 2023 [October 18th, 2023]
- Henry Kissinger: The Path to AI Arms Control - Foreign Affairs Magazine - October 18th, 2023 [October 18th, 2023]
- Stability AI releases StableStudio in latest push for open-source AI - The Verge - May 18th, 2023 [May 18th, 2023]
- Google CEO Sundar Pichai Predicts That This Profession Will Be ... - The Motley Fool - May 18th, 2023 [May 18th, 2023]
- Frances privacy watchdog eyes protection against data scraping in AI action plan - TechCrunch - May 18th, 2023 [May 18th, 2023]
- Investing in Hippocratic AI - Andreessen Horowitz - May 18th, 2023 [May 18th, 2023]
- As Alphabet flexes its AI prowess, there's a 'new elephant in the room' for Google - MarketWatch - May 18th, 2023 [May 18th, 2023]
- The Boring Future of Generative AI | WIRED - WIRED - May 18th, 2023 [May 18th, 2023]
- OpenAI readies new open-source AI model, The Information reports - Reuters.com - May 18th, 2023 [May 18th, 2023]
- What every CEO should know about generative AI - McKinsey - May 18th, 2023 [May 18th, 2023]
- AI creates images of the 'perfect' man and woman - Sky News - May 18th, 2023 [May 18th, 2023]
- Audit AI search tools now, before they skew research - Nature.com - May 18th, 2023 [May 18th, 2023]
- 3 Reasons C3.ai Stock Could Be Your Golden Ticket to the AI ... - InvestorPlace - May 18th, 2023 [May 18th, 2023]
- Zoom makes a big bet on AI with investment in Anthropic - VentureBeat - May 18th, 2023 [May 18th, 2023]
- AI voice phone scams are on the rise. Here's how to avoid them - USA TODAY - May 18th, 2023 [May 18th, 2023]
- Amazon is building an AI-powered conversational experience for ... - The Verge - May 18th, 2023 [May 18th, 2023]
- AI speculators need to 'differentiate between actual spending and investment' and hype: Strategist - Yahoo Finance - May 18th, 2023 [May 18th, 2023]
- AI Can Be Both Accurate and Transparent - HBR.org Daily - May 18th, 2023 [May 18th, 2023]
- You're Probably Underestimating AI Chatbots | WIRED - WIRED - May 18th, 2023 [May 18th, 2023]
- AI presents political peril for 2024 with threat to mislead voters - The Associated Press - May 18th, 2023 [May 18th, 2023]
- We need AI to help us face the challenges of the future - The Guardian - May 18th, 2023 [May 18th, 2023]
- End Of Googles Dominance? Stock Gets Rare Analyst Downgrade Over AI Fears - Forbes - May 18th, 2023 [May 18th, 2023]
- Watch 44 million atoms simulated using AI and a supercomputer - New Scientist - May 18th, 2023 [May 18th, 2023]
- AI Is The New Electricity: Bank Of America Picks 20 Stocks To Cash In On ChatGPT Hype - Forbes - March 2nd, 2023 [March 2nd, 2023]
- Tech Giants Are Barreling Headfirst Into an AI Arms Race - February 20th, 2023 [February 20th, 2023]
- Bing's AI Is Threatening Users. That's No Laughing Matter - TIME - February 20th, 2023 [February 20th, 2023]