Do You Hear What I Hear?

In our age of brain tickling nose swabs, there is a renewed interest in fast, accurate, and non-invasive tests for diagnosing illnesses. We’ve all probably experienced a Q-tip going just a little too far at some point in the last couple of years. Well researchers from University of South Florida, Cornell, and 10 other institutions are working with the National Institutes of Health to develop AI that could make medical diagnoses based on a person’s speech.

As part of the NIH’s Bridge to AI program, these universities are working to combine all of their voice databases and scale up to build AI models that can detect illnesses based on only an audio sample. Their hope is that the model could be trained to detect not only vocal issues, but also neurological problems, respiratory disorders, and even some pediatric disorders. 

This technology creates an incredible opportunity to bring very specialized diagnosis to underserved areas. People will no longer have to travel hundreds of miles to the nearest large hospital or specialist. Local primary care physicians could use the technology to get an initial diagnosis, without requiring more expensive labor and resources. 

The team’s long term goal is to make the technology available as an app. AI models like this can often be thought of as two phases: Training and Inference. Training is often expensive, requiring significant computing resources, CPU time, and developer time. This is where the model is given data with known solutions to learn from, and then tested to see how it performs against other data sets with known solutions. Once you have a trained model, that can be saved and used for Inference. Inference is when a user gives the system an input and asks it to “infer” an answer, based on its previous training. Once you have a trained model, inference is often trivial, requiring only milliseconds even on edge hardware such as a phone. This is how your phone can translate your spoken voice to text. It uses models that were trained for thousands or millions of hours on a myriad of human voices and languages. Then when you speak, it can figure out what word you said in a few milliseconds.

The real question with this technology, as with many AI innovations, is not if we CAN, but if we SHOULD. The issue is privacy. Do you trust Apple or Amazon to have access to your medical data in exchange for your iPhone or Alexa diagnosing your illnesses? That answer probably depends on a lot of factors, such as how accurate the diagnosis is, how relevant it is to you and your family, and how accessible the next best alternative is. The researchers are also encountering HIPAA concerns with the ownership of the voice data that people donate, especially with each university having its own set of compliance rules. Privacy is always a tricky question.

Looking to the future, I think the most interesting question here is what else are our bodies telling us that we just don’t know how to hear yet? The human body is probably the most complex machine on the planet. Despite incredible medical advances over thousands of years, there is still so much we don’t understand about it. Today, maybe we’ve learned we can detect more things using the voice. It was only about 100 years ago that we realized we could test blood for diseases. The next frontier is the brain. We understand so much about it, but it’s such a small percentage of everything there is to know. I’d like to imagine 100 years from now, we’ll be able to diagnose a patient with an entirely non-invasive brain scan. It’s certainly better than a Q-tip. 



Maryland Headquarters
6950 Columbia Gateway Drive,
Suite 450 
Columbia, MD 21046 USA

Georgia Office
100 Grace Hopper Lane,
Suite 3700
Augusta, GA 30901 USA

Texas Office
3331 General Hudnell Dr,
Suite 3
San Antonio, TX 78226 USA

Contact Info:



DUNS Number