When it comes to sorting through the massive volumes of big data, it’s possible to have too much of a good thing.
That’s where Dr. Mateen Shaikh comes in.
An expert in computational statistics and machine learning, the assistant professor is developing new models that can help sort through data, leaving researchers with the information they need, while removing the information that is unhelpful.
His goal? To predict the future, of course.
“If I wanted to know how much peanut butter I want my store to buy next month, I can look at everything from past sales of peanut butter, to bread, or what the market is like generally, and I can even include other variables that have a weird relationship — the weather, and what shows were on Netflix at the time,” he explained.
In cutting edge science, there’s no context for what is and isn’t related, so his research aims to eliminate variables with spurious relationships while merging others in order to predict the most likely future.
Where he sees real application for his work is within the health sciences.
“We don’t really know what genes are doing what in the human body. And we want to know what the effects of certain drugs are. When we’re trying to find out the effect certain drugs have on certain diseases, we need to know whether they’re coincidental, or not related, or if the changes are related to other changes in the genes.”
Supported by a Natural Sciences and Engineering Research Council (NSERC) Discovery Grant, Shaikh is also developing a computationally attractive evolutionary algorithm. This algorithm can be used to find correlations between adverse drug reactions across a diverse population over many decades, for example.
He intends to create open-source software for others to use, build on, and edit as they like. Researchers will be able to find the optimal amount of precision while analyzing past, current, and future data from government databases, independent studies and industry.