By Brian T. Horowitz, Editor and Contributing Writer
To manage 50 billion market events — nearly a petabyte of stored data — the Financial Industry Regulatory Authority (FINRA) turned to the Cloudera Enterprise big data platform along with an Amazon Web Services infrastructure.
FINRA is an independent regulator overseen by the Securities and Exchange Commission. Apache Hadoop, an open-source framework written in Java that enables storage and processing of large data sets, powers Cloudera Enterprise.
The Cloudera platform will help FINRA adhere to the requirements for dynamic monitoring of financial markets and allow the organization to scale for future growth.
By deploying an elastic public cloud platform from AWS, FINRA was looking to avoid overprovisioning, or having excess capacity, and will be better able to handle peak workloads.
“They have to process all the trades that happen on a daily basis,” Amy O’Connor, big data evangelist for Cloudera, told Power More. “In their own data center, they were not able to keep up with some of the legacy technologies they had. We helped them to move to the cloud over the past year.”
A big data platform in the cloud allowed FINRA to speed up query responses.
“With our implementation of big data and cloud technologies including Cloudera Enterprise, we’re reducing query response times from hours to seconds, and ensuring our platforms will scale to handle market demands,” Steve Randich, executive vice president and CIO of FINRA, said in a statement.
Here are seven ways cloud computing provides flexibility for companies such as FINRA when managing big data.
1. A way to develop and test data
A cloud environment provides an environment to develop and test big data, according to O’Connor.
“Groups want to experiment with slices of their data, instead of really having to provision in their own data centers,” O’Connor said. Experimentation in the cloud occurs before moving the data to a production environment.
Another benefit of managing big data in the cloud is what O’Connor calls “data gravity.” The data can be analyzed in the cloud, where the mobile or cloud apps are running.
“If you can analyze the data in the same place it gets created, then that’s a good reason to move the analysis closer to the data,” O’Connor said.
2. Tracking the lineage of data
The cloud also provides a place to map where data is created from the source of the data to the destination.
“When companies can map that, they can understand that path from creation to consumption,” O’Connor said. “Then they’ll have a better understanding as to do they need to move data and where they need to move it to.”
Tracking the lineage of data can also help data scientists spot errors in the data and help protect the security and privacy of the information. At NASA, data scientists use data tracing to find and enhance the value of data, FCW reported.
3. Maintaining data sovereignty
Managing data in the cloud can also enable companies to conform with the data governance laws of various countries, whether it’s the Health Insurance Portability and Accountability Act (HIPAA), the Freedom Act or EU Retention laws, O’Connor said.
Countries have individual laws as far as personally identifiable information, and they institute laws on keeping the data within country boundaries, O’Connor noted.
“A lot of [companies] look to cloud providers to solve that problem for them so they can keep the data within the bounds of the country, subject to the laws of that country,” O’Connor said.
In Germany, for example, call detail records of telecommunications companies must stay within the bounds of the nation, O’Connor noted.
4. Securing data and ensuring transparency
When it comes to managing big data in the cloud, the security and transparency of data are key considerations, according to O’Connor.
“Cloud providers have security experts as part of the whole data governance piece,” O’Connor said. “Sit down with your security experts and see how much control you have over securing your data in your cloud and what kind of transparency you can see into how it’s being managed.”
5. Managing Internet of Things data
The cloud can be a useful way to share data coming from an electrical grid, GPS units, supply chain operators or remote device sensors, noted Tony Baer, principal analyst at Ovum.
“As long as you don’t have any considerations regarding privacy, where you’re dealing with personal identification information, anything very proprietary or sensitive, then running this in the cloud is probably a no-brainer,” Baer said.
Of developers for Internet of Things application developers, 55 percent connect devices primarily through the cloud, according to a report by research firm Evans Data Corp.
6. Offering cloud subscriptions
For a big data platform such as Cloudera, companies need to have processes in place to take down a cloud environment, according to O’Connor.
“Leaving cloud environments running for a long time when they don’t need to be running just increases your costs and is similar to paying for a music subscription service you’re not using,” she said.
You’ll have to consider the privacy of the data and whether you want a subscription or to buy local storage — a case of rent versus buy, Baer noted.
That’s “the classic economics of cloud,” he said.
7. A platform for genomics research
Genomics research is another key application for big data in the cloud, which provides the flexibility to handle petabytes of data.
As part of a 13-year partnership, Dell provides cloud computing technology to the Translational Genomics Research Institute (TGEN), a nonprofit organization that sequences human genomes as it looks for ways to treat and diagnoses diseases, including neuroblastoma, a type of pediatric cancer.
“In the cloud, while you have the data sitting there all the time, what you’re able to do is expand the compute on demand,” O’Connor said, referring to the benefits of elastic computing.
“When you’re doing genomics-type research, quite often you’re running lots of batches over petabytes of data, so in the cloud, while you have the data sitting there all the time, what you’re able to do is expand the compute on demand,” O’Connor explained. “So that’s a perfect application of big data in the cloud.”