Distributed Analytics Meets Distributed Data with a World Wide Herd

Originally posted on CIO.com by Patricia Florissi, Ph.D.

What is a World Wide Herd (WWH)?

What does it mean to have “Distributed analytics meet distributed data?” In short, it means having a group of industry experts, in this case a group given the title of World Wide Herd, to form a global virtual computing cluster. The WWH concept creates a global network of distributed Apache™ Hadoop® instances to form a single virtual computing cluster that brings analytics capabilities to the data. In a recent CIO.com blog, Patricia Florissi, Ph.D., vice president and global CTO for sales and a distinguished engineer for Dell, details how this approach enables analysis of geographically dispersed data, without requiring the data to be moved to a single location before analysis.

To illustrate the power of the concept of distributed, yet collaborative, analytics in-place at worldwide scale, it sometimes helps to begin with an example. In this case, I will start with an example from the healthcare industry, and then dive down into discussion of the World Wide Herd, a global virtual computing cluster.

Hospitals around the world are moving to value-based healthcare and achieving dramatic reductions in costs. One way to achieve these goals is to make more effective and efficient use of expensive medical diagnostic equipment, such as CT scanners and MRI machines. When a hospital maximizes its utilization of these devices, it increases its ROI and potentially reduces its costs by avoiding the need to buy additional devices. In principle, it is contributing to more affordable care.

With a focus on value-based healthcare, Siemens Healthineers, the healthcare business of Siemens AG, is developing a global benchmarking analytics program that will allow its customers to see and compare their device utilization metrics against those of hospitals around the world. The goal is to help hospitals identify opportunities to gain greater value from their investments.

This global benchmarking analytics program will be offered via the Siemens Healthineers Digital Ecosystem, a digital platform for healthcare providers, as well as for providers of solutions and services, aimed at covering the entire spectrum of healthcare. The platform, announced in February 2017, will foster the growth of a digital ecosystem linking healthcare providers and solution providers with one another, as well as bringing together their data, applications and services.

Global benchmarking analytics in the Siemens Healthineers Digital Ecosystem will be powered by the innovative Dell World Wide Herd technologies, enabling the Internet of Medical Things (IoMT) for several healthcare modalities. Dell’s collaboration with Siemens delivers the ability to analyze data at the edge, where only the analytics logic itself and aggregated intermediate results traverse geographic boundaries to facilitate data analysis across multi-cloud environments—without violating privacy and other governance, risk and compliance constraints.

How it works

The WWH concept, which was pioneered by Dell, creates a global network of Apache™ Hadoop® instances that function as a single virtual computing cluster. The WWH orchestrates the execution of distributed and parallel computations on a global scale, across clouds, pushing analytics to where the data resides. This approach enables analysis of geographically dispersed data, without requiring the data to be moved to a single location before analysis. Only the privacy-preserving results of the analysis are shared.

Let’s take a closer look at how the WWH enables distributed, yet collaborative, analytics at a global scale.

First, WWH distributes computation across a virtual computing cluster and pushes analytics to its virtual computing nodes. In the case of Siemens, each virtual computing node is implemented by a cloud instance that collects and stores data from Siemens’ medical devices in local hospitals and medical centers.

Second, computation takes place, in real-time, where the data resides.

Third, only the privacy-preserving results are sent back to the initiating location, where they are aggregated, and a global analysis is performed on these results.

In the case of Siemens, each virtual computing node calculates a local histogram and sends it back to the initiating node, which combines all histograms together to provide global benchmarking. A hospital administrator looking at the global histogram can immediately gain insights on the performance of this one hospital compared to all the other hospitals in the world.

A WWH can have multiple configurations. The virtual computing nodes can be clouds in a multi-cloud environment or an Internet of Things (IoT) gateway in a multi-IoT gateway environment, where analytics is pushed directly to the gateways themselves.

In its ability to pair distributed processing and analytics with distributed data, the WWH overcomes several pressing IT issues. It helps organizations address the challenges of:

• An explosion in the numbers of connected devices and the volumes of IoT data that defy the scalability of centralized approaches to store and analyze data in a single location
• Bandwidth and cost constraints that make it impractical to move data to central repositories
• Security concerns for data in transit
• Regulatory compliance issues that limit the movement of data beyond certain geographic boundaries

The bigger picture

When you study these and other challenges, you see that we are in the middle of a perfect storm that is disrupting the status quo. Increasingly, we need to take the processing power and analytics to the data, rather than vice-versa. This is very much the future for many industries as we look to a world that is projected to have 200 billion connected devices in 2031. Data will increasingly be inherently distributed and inherently federated with limited data movement.

While the example I have used here focuses on a specific use case in the healthcare industry, the WWH concept can be applied across a wide spectrum of industries. In a December blog post, I explored the potential to use a WWH to advance disease discovery and treatment by enabling global-scale collaborative genomic analysis research. And, of course, WWH approaches can and will be used to help companies gain value from data spread across the IoMT and IoT in general.

At the end of the day, rich insights can be obtained when the domain of the data analyzed transcends geographical, political, and organizational boundaries, and can be analyzed as one virtual cohesive dataset. That’s the World Wide Herd in action.

About the Author: Jean Marie Martini