Neurally driven speech prostheses, also known as speech brain machine interfaces (BMIs), employ machine learning algorithms to map activity from the brain to the control inputs of vocal synthesizers. We will develop a Simulink and Matlab based framework to facilitate rapid closed loop prototyping and iteration of speech prosthesis designs. The framework will receive input from our existing physical neural interface devices and will incorporate existing and future machine learning models for mapping neural activity to configurable vocal synthesizers. We have an ongoing project that is developing a songbird animal model to advance neural prosthesis design and regularly record 100s of neurons simultaneously from motor areas of the brain that control vocal articulation. By the end of this short-term project, we will incorporate Simulink models to enable online, real-time vocal synthesis from brain activity for this research setting. This Simulink based system, developed and validated in an animal model, will also enable future human subjects work.
Our prior work establishes songbirds as a model system for vocal prosthesis development (Arneodo et al., Current Biology, 2021; Brown et al., PLoS Computational Biology, 2021) by demonstrating, offline, that the acoustic waveform of their vocalizations can be synthesized from neural activity alone. This prior work employs TensorFlow based machine learning models to map neural activity to the inputs of vocal synthesis algorithms. At present we employ LSTM and Transformer based models for mapping and we employ both physics based biomechanical models and machine learning based models for vocal synthesis. In this project, we will integrate TensorFlow based machine learning models and multiple vocal synthesis models with Simulink to enable real-time, low-latency vocal synthesis. The open source neural interface system we employ (OpenEphys Onix) facilitates low latency measurement (<1 ms) from over 300 neural channels as well as peripheral sensors (e.g. microphones, electromyography). We will establish a UDP based instrumentation network to provide these neural and peripheral inputs to Simulink. This capability will enable rapid prototyping of multiple modeling strategies in online closed loop neural control studies. Initially, we will employ the systems built in this project in the songbird model to test the translation of offline results to online control. We anticipate future translation to clinical studies, as we and our colleagues have developed multiple avenues for clinical translation. In previous studies, we used Simulink and Matlab to develop a high-performance neural activity controlled computer cursor system (motor BMI) in animal model (Gilja et al., Nature Neuroscience, 2012) and translated that same system for clinical studies (Gilja et al., Nature Medicine, 2015).
The proposed project will develop Matlab and Simulink based tools to accelerate the development of neural activity driven speech prostheses. By restoring functional communication ability, these BMI systems have great potential to improve quality of life for individuals who have lost communication and motor ability due to injury or degenerative disease. The tools developed by this project will allow researchers working collaboratively at multiple levels of this problem to design, develop, and test machine learning based neural prostheses efficiently, thereby enabling the rapid, reproducible translation of emerging science and engineering in this field to application for real-time evaluation.
- Input Neural Response: This subsystem takes input (neural response) and converts it into time series data (line 24).
- Deep learning Subsystem: This subsystem uses the deep neural network to predict the MFCC frame based on the input neural response.
- Buffer (Griffin-Lim): This subsystem is used to create a buffer that accumulates frames from the neural network's output and generates a matrix. The next subsystem (Griffin-Lim) applied the decoding algorithm to this matrix.
- Griffin-Lim Subsystem: This subsystem decodes the output matrix from the buffer and generates an audio signal. We also use a clipping function to clip the output between -1 and 1.