Dibyendu is a Principal Software Engineer with the RSA team, The Security Division of EMC. He has worked on many different technologies during his nearly 12 years in the security industry, including J2EE, Service Oriented Architecture and Distribute Computing. Dibyendu is very passionate about technology and over the last few years has educated himself on emerging technologies surrounding Big Data, such as Hadoop and Lucene.
Shortly after joining EMC, Dibyendu read the announcement for the 2012 EMC Proven Professional Knowledge Sharing competition. At that time he was researching Big Data Indexing using Hadoop and Lucene and thought it was a great opportunity to share his knowledge. He found it quite a challenge to write a quality paper and be selected as among the best Articles submitted in this year’s competition. As Dibyendu says, “it has been a great journey since reading the first announcement”. To meet the competition guidelines, Dibyendu needed to earn his first EMC Proven Professional certification. Not only has he earned the Cloud Infrastructure and Services Associate, but the Data Science Associate as well.
Dibyendu sees Big Data related technology gaining a lot of traction in the industry and the EMCPP Knowledge Sharing program is a great opportunity to gain knowledge and stay ahead of the technological curve. He is presently pursuing a Masters in Software Systems and has aligned his career goal towards becoming a Data Scientist.
Dibyendu Bhattacharya’s Knowledge Sharing Article
There are various approaches one can take to solve Big Data problems; most prominently Hadoop and Solr, popular open source software widely used in large-scale distributed systems.
Apache Hadoop, a software framework that supports data-intensive distributed applications, enables applications to work with thousands of nodes and petabytes of data. Technically, Hadoop consists of two key services: reliable data storage using the Hadoop Distributed File System (HDFS) and high-performance parallel data processing using a technique called MapReduce.
On the other hand, Solr is the popular, open source enterprise search platform built on Lucene Java search library. Solr runs as a standalone server and uses Lucene for full-text indexing and search. It has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language.
This Knowledge Sharing article, awarded Best of Big Data in the 2012 Knowledge Sharing Competition, author Dibyendu Bhattacharya provides:
• knowledge about how the Hadoop framework works
• the concept of MapReduce
• how to perform large-scale distributed indexing using Lunece
• how to query and search the indexed artifacts using Solr.
If you are an EMC Proven Professional, you can access the article here.
To access this content, you will be required to log in to the Proven Professionals ONLY area of the Community. Having trouble? Click here for help.
If you have a PowerLink account, you can access the article here.