Start-up helps two big sisters create a custom, synthesized voice for their sister

저자 Lisa Harvey, August 2, 2018

8 회 조회 (최근 30일) | 0 좋아요 | 1 댓글

Maeve is an eleven-year-old that lives just outside of Boston with her parents and two big sisters. She has cerebral palsy and relies on a computer-generated voice synthesizer to communicate. The problem is, there are only a handful of voices available for her assistive communication device, and none of them are quite right for the pre-teen known for her sense of humor.

The family learned a local start-up, VocaliD, was developing technology to create custom voices for assistive speech technology. After Maeve’s sister Erin emailed the start-up, Maeve became VocaliD’s first customer.

Sisters Erin, Maeve and Meghan. Image Credit: VocaliD.

VocaliD was founded by Dr. Rupal Patel, a former speech therapist. The start-up crowdsources human speech to create custom digital voices. Dr. Patel told a local TV station, “Even when someone has a very severe speech disorder, there are certain aspects of their voice that are preserved. I thought these individuals have unique voices, why can’t we make their devices sound more differentiated?”

Adaptive alternative communication

Adaptive alternative communication (AAC), has provided many people the ability to speak, but the voices are often computer sounding or that of 40-year-olds. The options don’t typically fit young children. And for adults who lose their voice due to medical conditions, they often feel they lose part of their identity along with their voice. The most famous generated digital voice is likely the robotic-sounding speech synthesizer used by the late Stephen Hawking.

The Guardian, in an article titled How a new technology is changing the lives of people who cannot speak, explained, “Hawking’s case is one of the most striking examples of the way a person’s voice shapes their identity. Though the robotic quality of his digital voice (and the American accent) felt inappropriate at first, it came to be his trademark. Hawking reshaped himself around his new voice, and years later, when he was offered the opportunity to use a new voice that was smoother, more human-sounding, and English, he refused. This felt like “him” now.

“The “Stephen Hawking voice” doesn’t belong only to Hawking. In the years since it was created, the same voice has also been used by little girls, old men, and people of every racial and ethnic background. This is one of the stranger features of the world of people who rely on AAC: millions of them share a limited number of voices. While there is more variety now than before, only a few dozen options are widely available, and most of them are adult and male.”

Creating a truly unique, synthesized voice

While Maeve cannot form words, the sounds she can make provides information about how her speech would sound. Geoffrey Meltzer, who leads research and technology at VocaliD, recorded sounds that Maeve can make in order to create a custom voice. Her sisters Erin and Meghan then recorded hours of speech and data from a voice donor was added. The data from the four sources were used to train a statistical synthesizer. With less than five hours of processing, VocaliD developed a custom voice they could install on Maeve’s AAC. It was not a copy of her sisters’ voices, but rather a voice of her own based on the sound characteristics of her vocalizations.

VocaliD speech synthesizer . Image Credit: VocaliD.

“VocaliD uses state-of-the-art speech signal processing algorithms to extract the vocal identity cues from recipient’s like Maeve which are blended with recordings of a matched voicebank contributor,” says Meltzner. “While Maeve’s sisters shared their voice in this instance, siblings aren’t necessarily the best matches. The blended spoken dataset is then used to train a deep learning-based statistical speech synthesizer to create Maeve’s uniquely personalized, digital voice which can then be installed on her speech generating device.

“MATLAB, and especially the Signal Processing Toolbox, played an integral role in the prototyping, refining, and testing of the speech processing algorithms which are used in the production of our voices today.”

Donating or banking your voice

For people who know their loss of voice is imminent, they can bank their voice. This option helps people facing oral surgery of progressive diseases such as ALS maintain the ability to communicate in their natural voice.

Voice donors are also needed, especially for younger children and teens. NBC News shared how a group of seventh graders in California added their voices to the growing collection.

You can change someone’s life by simply reading out loud and recording your voice. A voice donor must record approximately 3,500 sentences. To become a voice donor, visit VocaliD’s voicebank.