From Neuroscience to Data Science

Project: Root cause analysis of difference in support hours
ROI: Model suggests saving of 500-1,000 support hours on average weekly (up to $5M annually)

I have recently made the transition from academic neuroscience to becoming a member of the Data-Science-as-a-Service team in EMC’s IT organization. The change from academia to the business world is far from trivial. Coming from a computational neuroscience lab, where most of the work involved developing probabilistic models for the activity of neural populations, simulations and implementations were not a top priority. As a data scientist with a mostly theoretical background, coping with implementation, let alone implementation in a Big Data environment, is challenging.

Lucky for me, the change of scientific domains underlying the two disciplines is not as large a “leap” as it may seem at first. When you think about predictive analytics, what is more natural than to think of our brain as a complicated learning machine whose main goal is data compression and interpretation?

Just like a business unit, our brain is constantly required to adapt in a rapidly changing, data driven environment. In my research I studied the brain’s visual module and built a model for the computations performed between two consecutive levels in the visual hierarchy, yet all of the sensual modules can be said to work the same way: an infinite multidimensional input needs to yield a finite set of responses.

A nice example to illustrate how the brain may handle high dimensional data is Oja’s rule, a learning rule that describes how model neurons in artificial neural networks (ANNs) change connection strength or “learn” over time. Oja’s rule is a modification of the simplest learning rule known as Hebb’s rule: “neurons that fire together wire together”. This means that neurons that respond similarly given the same stimulation will tend to be strongly connected. In the figure below, we see the different neurons and their connections in the visual cortex which connect the first layer of the visual processing (V1) to the higher level of processing (V2).

Figure 1 — A neural network that performs a linear combination on the input. V1 and V2 are consecutive levels in the cortical visual pathway. This drawing is actually taken from my Neuroscience thesis but is also encountered in various machine learning contexts.

These rules can be said to underlie unsupervised learning in the brain, the type of learning encountered most frequently in real life business problems. One such algorithm is Principal Component Analysis (PCA). PCA is a dimensionality reduction algorithm commonly used as a pre-processing step in many machine learning applications. When we look at the mathematical structure used to describe the neural connections in our brains, we can derive the PCA algorithm based on Oja’s Rule (see the mathematical derivation at the bottom of the post).

Surprisingly, this process, occurring automatically in our brains billion of times each second, is an elementary step in many data analysis applications. As a concrete example, a recent team’s project included the analysis of a large dataset of in-field storage machines. We wanted to identify interesting features in the data collected for predicting amount of service hours and Root Cause Analysis.

As the data contain numerous features, it was impossible to know deductively which features are “linked” the most to specific problems and should thus be incorporated in the analysis. In situations like these, applying PCA as a preprocessing task can yield important insights. The algorithm scans the data for meaningful patterns in all the different directions and extracts the features’ combinations along which the separation to meaningful clusters is the most prominent.

Once we had the sorted PC’s (principal components), we could use the top two in order to visualize the machines as points in a two- dimensional feature space. This delivered the segregation we were after, as can be seen in the image below.

Figure 2-Visualization of first two PC’s of the VMAX data set. A: the separation of machines to clusters is apparent by eye. B: Further analysis was performed and the colors added according to the machine’s group.

One of the tasks of Data Science is to mimic the behavior of a human expert; for example, it would be worthwhile to reproduce the way an analyst easily looks at a tableau visualization of a person’s network usage patterns and immediately identifies when his account has been hacked. In this post I have tried to show how the same algorithms that evolve naturally from a simple learning rule applied by our brain can be quickly and semi- automatically replicated on a big data set. This generalization obviously holds great value in an ostensibly unregulated business environment.

A special thanks goes to Dr. Oren Shriki for the tips and notes.

A Look At The Math

For those interested in understanding how Oja’s rule leads me to PCA, here is a neat derivation (WARNING: may contain equations, not for the faint of math):

Hebb’s learning rule states that each time two neurons fire in sync their connection strength grows:

Here is the change in connection strengths, is the positive learning rate, and and represent the firing response of the input and output neurons on the ’th iterration, respectively. Normalizing the weights and expanding to a Taylor series leads to Oja’s rule: