This month's guest blog is written by Dr. Rupal Patel, founder of VocalID. View her TEDtalk, which has been viewed nearly one million times.
What’s more personal, more distinctive, more YOU, than your voice? There's an evolutionary reason that we all have unique voices -- our voice instantly conveys our age, our gender, our ethnicity, even our intelligence and emotional state. It’s how people know us, its how people remember us.
Yet there are tens of millions worldwide who are unable to speak. They include children born with cerebral palsy, a third of those diagnosed with autism, and adults with head or neck cancer, or degenerative conditions such as Parkinsons Disease or ALS. These individuals need to rely on computerized devices to communicate. The problem is that the voices on those devices are generic and lack personality. I can’t tell you how many special education classrooms I’ve walked into and seen several children using the exact same voice and even more troubling, adult voices. We wouldn’t dream of fitting a little girl with the prosthetic limb of a grown man, so why use the same prosthetic voice?
It is heartening that awareness of this issue has entered mainstream popular culture. ABC has a new sitcom about a boy who cannot speak … and the boy choses not to use a computerized voice at all. Instead, he has a home health aide speak for him. He chose a smart, funny, thirty-something guy from the line up of “voice talent” misfits. Although Hollywood can cast a human translator, there are more practical solutions.
In today’s digital age, we can think beyond hardware that replaces function alone, and think about software, that emulates form and function. We can now create custom voices that are personalized, natural, and empowering. The challenge is to craft a voice that captures the essence of the person using it. Your body’s central organ is your heart. Your personality’s central organ is your voice. Like the lines on your palm, your voice deepens and takes on the melody that reflect its everyday habits and usage. So how can we replace this amorphous function that is so much more then the physical organs that give rise to it? Well, this is where science and technology give us new hope.
Until now, creating a synthesized voice entailed hiring a voice actor to record thousands of sentences in various speaking styles. These recordings are then annotated and spliced into little snippets of speech that can be recombined to read aloud any novel text string. This process is time and labor intensive – it took an undisclosed sum of millions of dollars to create Siri’s voice. And yet, Siri is just one white, 40 something-ish, middle class, American sounding female voice.
We don’t believe in uniform voices. We believe that everyone has a unique voice that deserves to be heard. To deliver on this, we have crowdsourced the voice collection process. Instead of a voice actor spending weeks recording, anyone can record on our online web platform. All they need is a computer, headset microphone and a quiet room. Today, over 17,000 speakers ranging in age from 6-91 from 110 countries have contributed more than 7 million sentences to our Human Voicebank initiative. That means we can create a variety of diverse voices. But we can do more.
We can now reverse engineer a voice by blending vocal samples of those who are unable to speak with several hours of recording from a matched speech donor. That’s because, through years of research, we’ve discovered that even a single vowel contains enough vocal DNA to seed the voice personalization process. This discovery, along with our growing Voicebank and voice blending algorithms, allow us to create unique digital voices for a fraction of the price and with all the warmth and nuances of the natural human voice.
So what are our early adopters saying? They tell us that they are talking more – as much as 300% more; at school, with friends and with strangers. And its not just the recipients that benefit. One mom told me that her son used to poke fun of the generic voice that his sister used, but now he wouldn’t dare, as he sees that same that same device with her new BeSpoke voice as an extension of her. A father of a 9 year old who received our BeSpoke voice said “quite frankly, its as if I’ve heard my daughter for the first time."
For others like John, we have been able to reunite him with his voice. As a management consultant, John’s voice was his identity, his livelihood. About 8 years ago he was diagnosed with ALS. It happened so quickly. He didn’t have the chance to bank his voice. His wife Linda tells a heart wrenching story about how she lost the last remnant of his voice when she traded in his cell phone that had his voicemail greeting on it for an iPad.
We created a voice for John using the brief sounds he can still make combined with recordings of this matched voice donor. In fact, we made three suitable voices. I still recall the day we presented those voices to him. We had a TV crew with cameras rolling and lots of excitement in the air. We played him samples of the first two voices and he and Linda smiled politely saying, yes, its better than the generic voice we have been using. Then we played the last one, and his whole body began to shake. He was crying tears of joy, and Linda was whispering “That’s you!” It was an incredible moment for us all.
There is still much work to be done. Creating custom voices requires high quality recordings and crowdsourcing doesn’t always yield the best quality recordings. To that end, we are partnering with schools, studios, and community organizations to host voice drives. Join us, as we give VOICE a whole new meaning. Because individuality matters, and it always will.
Yet there are tens of millions worldwide who are unable to speak. They include children born with cerebral palsy, a third of those diagnosed with autism, and adults with head or neck cancer, or degenerative conditions such as Parkinsons Disease or ALS. These individuals need to rely on computerized devices to communicate. The problem is that the voices on those devices are generic and lack personality. I can’t tell you how many special education classrooms I’ve walked into and seen several children using the exact same voice and even more troubling, adult voices. We wouldn’t dream of fitting a little girl with the prosthetic limb of a grown man, so why use the same prosthetic voice?
It is heartening that awareness of this issue has entered mainstream popular culture. ABC has a new sitcom about a boy who cannot speak … and the boy choses not to use a computerized voice at all. Instead, he has a home health aide speak for him. He chose a smart, funny, thirty-something guy from the line up of “voice talent” misfits. Although Hollywood can cast a human translator, there are more practical solutions.
In today’s digital age, we can think beyond hardware that replaces function alone, and think about software, that emulates form and function. We can now create custom voices that are personalized, natural, and empowering. The challenge is to craft a voice that captures the essence of the person using it. Your body’s central organ is your heart. Your personality’s central organ is your voice. Like the lines on your palm, your voice deepens and takes on the melody that reflect its everyday habits and usage. So how can we replace this amorphous function that is so much more then the physical organs that give rise to it? Well, this is where science and technology give us new hope.
Until now, creating a synthesized voice entailed hiring a voice actor to record thousands of sentences in various speaking styles. These recordings are then annotated and spliced into little snippets of speech that can be recombined to read aloud any novel text string. This process is time and labor intensive – it took an undisclosed sum of millions of dollars to create Siri’s voice. And yet, Siri is just one white, 40 something-ish, middle class, American sounding female voice.
We don’t believe in uniform voices. We believe that everyone has a unique voice that deserves to be heard. To deliver on this, we have crowdsourced the voice collection process. Instead of a voice actor spending weeks recording, anyone can record on our online web platform. All they need is a computer, headset microphone and a quiet room. Today, over 17,000 speakers ranging in age from 6-91 from 110 countries have contributed more than 7 million sentences to our Human Voicebank initiative. That means we can create a variety of diverse voices. But we can do more.
We can now reverse engineer a voice by blending vocal samples of those who are unable to speak with several hours of recording from a matched speech donor. That’s because, through years of research, we’ve discovered that even a single vowel contains enough vocal DNA to seed the voice personalization process. This discovery, along with our growing Voicebank and voice blending algorithms, allow us to create unique digital voices for a fraction of the price and with all the warmth and nuances of the natural human voice.
So what are our early adopters saying? They tell us that they are talking more – as much as 300% more; at school, with friends and with strangers. And its not just the recipients that benefit. One mom told me that her son used to poke fun of the generic voice that his sister used, but now he wouldn’t dare, as he sees that same that same device with her new BeSpoke voice as an extension of her. A father of a 9 year old who received our BeSpoke voice said “quite frankly, its as if I’ve heard my daughter for the first time."
For others like John, we have been able to reunite him with his voice. As a management consultant, John’s voice was his identity, his livelihood. About 8 years ago he was diagnosed with ALS. It happened so quickly. He didn’t have the chance to bank his voice. His wife Linda tells a heart wrenching story about how she lost the last remnant of his voice when she traded in his cell phone that had his voicemail greeting on it for an iPad.
We created a voice for John using the brief sounds he can still make combined with recordings of this matched voice donor. In fact, we made three suitable voices. I still recall the day we presented those voices to him. We had a TV crew with cameras rolling and lots of excitement in the air. We played him samples of the first two voices and he and Linda smiled politely saying, yes, its better than the generic voice we have been using. Then we played the last one, and his whole body began to shake. He was crying tears of joy, and Linda was whispering “That’s you!” It was an incredible moment for us all.
There is still much work to be done. Creating custom voices requires high quality recordings and crowdsourcing doesn’t always yield the best quality recordings. To that end, we are partnering with schools, studios, and community organizations to host voice drives. Join us, as we give VOICE a whole new meaning. Because individuality matters, and it always will.