This discussion takes place Oct 11th - 21st. Be sure to login to enable posting replies.
Welcome to the Dell EMC Converged Platforms and Solution’s Division Ask the Experts conversation.
This conversation thread will address one of IT’s most dreaded and riskiest jobs: upgrading and patching infrastructure with new firmware and hypervisor releases.
How much do you hate doing it? What’s your best practice to save time? What can you do to ensure compatibility to avoid outages?
Chat with ESG Lab experts, who recently benchmarked upgrading converged systems versus traditional, siloed data center infrastructure.
Learn how Dell EMC converged systems and software improves infrastructure upgrade and patch experience on our VCE Vision Intelligent Operations Software page.
|LOOKING TO PARTICIPATE? CLICK HERE TO REGISTER/LOGIN|
This debate kicks off with a short, benchmark test video - backed by downloadable lab documentation of the benchmark tests.
|FOLLOW THE ASK THE EXPERT COMMUNITY ON ECN|
Meet Your Experts:
Vice President, ESG Lab
Brian Garrett is the Vice President of ESG Lab, providing independent, hands-on validations and analyses of emerging technologies in the server, storage, data management, and information security industries. Throughout his career, Brian has created and managed world-class embedded development, performance measurement, and test organizations at companies including EMC, SEPATON, and Formation. Prior to joining ESG, Brian was the co-founder and CTO of I/O Integrity. Brian holds a degree in pure mathematics from Rutgers University, and he has been awarded nine patents.
Senior Lab Analyst
Mike currently works at ESG as a Senior Lab Analyst where he conducts independent, hands-on technical and economic validations and analyses of emerging enterprise IT hardware and software products. Prior to joining ESG, Mike worked for more than five years as a senior performance engineer in the unified storage division of EMC Corporation, focusing on characterizing and analyzing new product performance.
Senior Director, Dell EMC CPSD
Scott is Senior Director of Platform Engineering in the Converged Platforms and Solutions division of Dell EMC. He is responsible for design and development of the Vscale, Vblock, VxBlock and VxRack family of Converged Systems, as well as CPSD’s complimentary offerings which include Data Protection and management infrastructure. Scott and CPSD are working to deliver next generation converged infrastructure to transform the economics, agility and profitability of enterprises and service providers as they transition to cloud-enabled business models.
Product Manager - Dell EMC CPSD
Sameer is a principal product manager at Dell EMC Converged Platforms and Solutions Division (formerly VCE) where he leads the product direction and strategic planning for Converged Infrastructure Management. He has over 15 years of expertise in Data Center, Networking and Collaboration industry. He is a certified Agile Product Owner with deep experience leading cross-functional teams. .
Consultant-Systems and Software Management, Dell EMC CPSD
David is responsible for system management software and network architecture product marketing for the Dell EMC Converged Platform and Solutions Division (formerly VCE). With more than 25 years' experience at top-tier and start-up IT infrastructure and software providers, David specializes in strategies and solutions that help IT operations teams transform into more efficient and business-impact focused organizations.
Director Platform Engineering - Dell EMC CPSD
Ted Balman is the Director of Platform Engineering, Core Technologies in the Converged Platforms and Solutions division of Dell EMC. He is responsible for many cross functional domains for the Vscale, Vblock, VxBlock and VxRack family of Converged Systems, as well as CPSD’s complimentary offerings which include Data Protection and management infrastructure. Some of these cross functional domains are, Product Security Engineering, the Release Certification Matrix (RCM) and Physical Infrastructure Engineering.
|INTERESTED ON A PARTICULAR TOPIC? SUBMIT IT TO US.|
Share this event on Twitter or LinkedIn:
>> Ask the Expert: Reducing the Pain and Risk of IT Infrastructure Upgrades and Patches through Converged Systems http://bit.ly/2cQSpy7 <<
Welcome to this Dell EMC Community Network Ask the Expert conversation. Our discussion is now open for interaction.
When formulating your question please consider the following:
Sign in or register to engage – You must login or, if you’re new, register to DECN in order to post a question, like, bookmark or follow this discussion. https://developer-content.emc.com/login/login.asp.
Be respectful – SMEs are volunteering to assist you and will provide you with information that they have access to.
Provide details and background – Our SMEs will try their best to provide a full answer for your satisfaction, but will appreciate as much information you can provide about your inquiry.
Keep focus and within topic – The expectation is to receive questions about the topic is that is being presented on the title and description of this event. We would appreciate if questions are kept within those boundaries.
Be appreciative – Many tools available for this, likes, user ratings, follow, bookmark and the good old “thank you”.
We’re delighted to know you’ve decided to join our debate and hope you receive the answers you’re eager to find. Thanks!
I'd like ESG Lab to explain the results of their system upgrade and patch process testing in context of IT transformation goals that many organizations share. How can the release certification matrix process help drive IT agility and more overall IT process efficiency?
People newly exposed to Dell EMC (formerly VCE) converged platforms are intrigued by the release certification matrix (RCM). Perhaps Scott Redfern, Senior Director of Engineering, will shed some light on the commonly asked questions: What is the RCM? What kind of validation testing does Dell EMC do? How many tests and how many hours of testing?
Great question, @DavidWHayward
Organizations are always looking for ways to improve IT operational efficiency, whether it be related to deployment, management, protection, security etc. One key area where organizations focus on is ongoing maintenance – in other words, keeping your infrastructure up-to-date. This is important not just to ensure all resources are running optimally, but also to ensure everything is secure. All it takes is one out-of-date piece of hardware or an unpatched security vulnerability to bring an infrastructure to its knees. Traditionally, this process takes a very long time, especially in environments with disparate resources spread out across multiple sites. Management complexity, limited understanding due to lack of documentation, personnel buy-in, and cost are just some of the things that lead to extensive delays in updating components to the latest and greatest software. Imagine if you could easily audit your infrastructure in seconds to understand what needs updating, quickly download only the upgrades you want/need, and then apply those updates knowing they've already been vetted based on your exact infrastructure. The time savings and increase in IT agility is obvious.
The goal of our testing was to understand the process that goes into maintaining a fully-functioning IT infrastructure and quantify the time it takes to satisfy all of the requirements across all phases of the upgrade processes. For updating components in an infrastructure, it’s not just about stumbling upon a new rev of software and updating one piece of hardware on a whim. It’s about understanding what needs updating, what impact the update will have on the rest of the infrastructure, if downtime will be experienced, how long will the update take, and was the update successful. To that end, VCE’s release certification matrix takes all the guess work out of the update process, saving hours of time when updating an infrastructure – of course in this case the infrastructure is a VCE Vblock. They do all the research, testing, and validation of new updates for you. They even go a step further and guide you on how to complete the updates, including update order and expected timing. And it’s worth mentioning that with VCE, often times upgrades can be done without experiencing any downtime. The full report goes into much more detail about the steps in each phase of the update process and expected timing to complete each phase based on infrastructure type and component count. In short, the RCM plays an integral role in not only improving IT operational efficiency and agility, but enabling higher levels of IT administrator productivity by dramatically reducing the time required to maintain and upgrade a complete IT infrastructure.
Customers often ask us if they purchased a converged system from Dell EMC, then how would the company help them deal with components' technical and security alerts and patches for remediation differently than if they purchased multi-vendor components separately and integrated them. Ted Balman, based on your experience with these issues in the field, will you explain with a couple of real-life examples?
That’s an excellent question Dave! I’ll provide some color and ask Ted Balman to provide some added perspective as well. Ted is Director of Core Technologies within Platform Engineering and two of the areas he is responsible for are Release Certification Matrices (RCM) and Security Threat Monitoring and Remediation (PSIRT) for our Blocks, Racks and Vscale products.
RCMs are an essential part of Dell EMC’s Blocks, Racks and Vscale value proposition. RCMs are released 10 times per year with two being major releases and the other 8 being minor releases. You might think that RCM is just a new name for an HCL or support matrix, but it’s far more. Let me explain… a support matrix is a list of different components that are either known or believed to be compatible with one another. Navigating a support matrix can be complex, and even with a front-end tool it requires knowing all of the components of your configuration to enter into the tool. The resulting output is usually a collection of potential configurations, each with a series of release notes that must be read to see if additional drivers, BIOS updates, are required. Then the customer pulls all of these together manually to end up with a valid update pack.
A Release Certification Matrix on the other hand has done all of that for you. An RCM release is not a matrix, it is a bundle of every piece of software, driver and firmware/BIOS that is required for your system. The version of each of those is carefully considered and mated for optimal interaction as a converged infrastructure product. The end result is a bundle of software that can be downloaded and pre-staged on your Vblock, VxBlock, or VxRack by our Vision software and it encompasses the necessary changes for all supported components. For example, RCM 126.96.36.199 identifies approximately 55 different components that can be included in a VxBlock 740 configuration and the certified software release required on each one in order to be compliant with that RCM. Certified means that we have tested a VxBlock 740 with all of those components installed, running the specified software, driver, firmware versions, and run hundreds to thousands of regression tests against it for 1-2 months to ensure that we can say it meets our quality, stability, and performance expectations for your workloads.
The last major RCM release encompassed over 3,500 individual tests spanning a period of about 6 weeks (2 agile sprints) in our labs before it was deemed ready for release. Phew, that's a lot so it's a good thing our tests are nearly 100% automated!
Thanks Scott for providing such a complete answer! I'll just add a couple of real world examples on what drives us to make changes to the RCM.
1. Security vulnerabilities are the number one driver to RCM change. Depending on the severity of the issue, we determine when the changes have to be made. If the flaw is severe enough, we'll push it into an addendum ASAP to protect our customer base. In the last couple of years, we have managed out multiple high level security vulnerabities. A few of note are Ghost, Heartbleed and glibc. So far in 2016 the PSIRT process has managed out 73 vulnerabilities found in our products.
2. Another driver is major change to a specific component that customers are clamoring for. A good example would be for the latest version of hypervisor from VMware.
Hopefully this additional note provides some good understanding on the key drivers. There are others as well, but these two are the biggest contributors.
Great points Ted! Sometimes people wonder why it takes longer for us to add that new component or release which everyone is waiting for. It comes down to the value of the RCM as a bundle that's certified for all components in the Vblock, VxBlock or VxRack. If we are certifying a new piece of software, and it is only listed on a server supplier's HCL for 5 out of 6 models that we offer in our product, then we will wait until we can work with them to get that 6th model onto their HCL and then do our RCM testing. Now if that model isn't one that you care about as an end-user it may frustrate you, but it assures all of our customers that they don't have to do a stare and compare at one of those matrix eye charts to see if their component made it into our test.
Fantastic information Mike... I loved reading your white paper because it hit on many salient points of why we spend so much time on our RCMs. We are always looking at ways that we can make both the certification process more valuable to customers, and to help make the software update implementation process less complex for IT administrators through implementation services, instruction documents, Vision integration, and future automation. So we hope we'll get some questions and feedback in that area too during this "Ask the Expert" discussion to help us zero in on how we can take that next quantum step in making RCMs even better!
We've conducted RCM feedback sessions at EMC World, our Customer Advisory Board, the VUG at VMworld and with individual customers & partners this year. Don't tell anyone, but we are collecting requirements now for a next generation RCM approach... hint, hint.