Scientists used brain scanning, artificial intelligence, and speech synthesis, which turned brain patterns into a comprehensible verbal speech – progress that eventually could give voice to those without.
The scandal is Stephen Hawking is not alive to see this, as he may have received a real blow from him. The new speech system, developed by researchers at the Neuro-Acoustics Processing Laboratory at Columbia University in New York, is something that the late physicist can benefit.
Hawking had amyotrophic lateral sclerosis (ALS), a motor neuronal disease that took away his verbal speech, but continued to communicate using a computer and speech synthesizer. By using a stick-wrench switch, Hawking could present words on a computer that were read by a voice synthesizer. It was a bit boring, but it allowed Hawking to produce about a dozen words per minute.
But imagine if Hawking did not have to manually select and activate the words. Indeed, some individuals, whether having ALS, a locked syndrome or recovering from a stroke, may not have the motor skills needed to control the computer, even with self-adjusting the cheek. Ideally, the artificial voice system would involve the thoughts of the individual directly to produce speech, thus eliminating the need for control of a computer.
The new research published today in Scientific Progress brings us an important step closer to that goal, but instead of attracting the inner thoughts of the individual to reconstruct the speech, he uses the brain patterns produced while listening to the speech.
In order to develop such speech neuroprostheses, neuroscientist Nima Mesgrani and his colleagues combined recent achievements in deep learning with speech synthesis technologies. Their resulting brain-computer interface, although still rudimentary, captured the brain patterns directly from the auditory cortex, which were then decoded by a vocoder using AI, or speech synthesizer, to produce comprehensible speech. The speech was very robotic, but almost three out of four listeners were able to understand the content. It is exciting progress – one that could ultimately help people who have lost the capacity to speak.
To be clear, the Mesugani's neuroprosthetic device does not translate the secret speech of the individual – it is the thought in our heads, which is also called a thoughtful speech – directly in words. Unfortunately, we are not quite here in terms of science. Instead, the system captured specific cognitive responses of the individual while listening to the recordings of the people who spoke. The deep nerve network could then decode, or translate, these models, allowing the system to reconstruct the speech.
"This study continues with the recent trend in applying deep learning techniques to decode neural signals," said Andrew Jackson, professor of neural interfaces at Newcastle University, who was not involved in the new study. Gizmodo. "In this case, nerve signals are recorded from the brain's surface of the people during the operation of epilepsy.The participants hear different words and sentences read by the actors.Neural networks are trained to learn the relationship between the signals of the brain and the sounds, as a result they can reconstruct intelligible reproductions of words / sentences based solely on the signals of the brain. "
Patients with epilepsy were selected for the study because they often have to undergo brain surgery. Mesgrani, with the help of Annes Dinesh Mehta, a neurosurgeon at the Neuroscience Neuroscience Institute Northwell and co-author of the new study, recruited five volunteers for the experiment. The team used invasive electrocorticography (ECoG) to measure neural activity, as patients listened to continuous speech voices. Patients listened, for example, to speakers who recited digits from zero to nine. Their brain patterns were then inserted into a vocoder using AI, which resulted in synthesized speech.
The results were very robotic, but pretty understandable. In the tests, trainees could correctly identify the vocal numbers about 75 percent of the time. They could even show whether the speaker is a man or a woman. Not bad, and the result that even came as a "surprise" to Messenger, as he said Gizmodo in e-mail.
The speech synthesizer recordings can be found here (the researchers tested different techniques, but the best result came from a combination of deep neural networks with a vocoder).
The use of a voice synthesizer in this context, as opposed to a system that can match and recite previously captured words, was important for Messengers. As explained Gizmodo, there is more to speech than just to collect the right words together.
"Since the purpose of this work is to restore communication to those who have lost the ability to talk, we wanted to learn direct mapping from the brain signal to the speech itself," he told Gizmodo. "It is possible to decode the phonemes [distinct units of sound] or words, however, speech has much more information than just content – such as a speaker [with their distinct voice and style], intonation, emotional tone, and so on. Therefore, our goal in this particular paper is to restore the sound itself. "
Looking ahead, Messagani would like to synthesize more complicated words and sentences and to collect brain signals from people who simply think or imagine an act of speaking.
Jackson was impressed by the new study, but he said it was still unclear whether this approach would be applied directly to brain-computer interfaces.
"In the document, the decoded signals reflect the true words that the brain is hearing. To be useful, a communication device should decode the words imagined by the user," said Jackson Gizmodo. "Although there is often overlap between areas of the brain involved in hearing, talking and imagining speech, we still do not know exactly how similar brain signals will be related."
William Tatum, a Neurologist at the Mayo Clinic, who was also not involved in the new study, said the research is important because it is the first to use artificial intelligence to reconstruct speech from the brainwaves involved in generating known acoustic stimuli. The significance is significant, "as it impedes the application of deep learning to the next generation of better-designed speech-producing systems," he told Gizmodo. It said that the sample size of the participants was too small and that the use of data extracted directly from the human brain during the operation was not ideal.
Another limitation of the study is that neural networks, in order to do more than just reproduce words from zero to nine, will need to be trained for a large number of brain signals from each participant. The system is patient-specific because we all produce different brain models when listening to speech.
"It will be interesting in the future to see how well trained decoders for a person are generalized to others," said Jackson. "It's a bit like the initial speech recognition systems that need to be individually trained by the user, as opposed to today's technology, such as Siri and Alexa, who can make sense of someone else's voice, again using nerve networks. time will tell if these technologies one day will do the same for brain signals. "
No doubt there is still a lot of work to do. But the new document is an encouraging step towards achieving neuroprosthesia for implanting speech.[Scientific Reports]