Bringing Self-Service Server Decommissioning to Dell IT

Take a behind the scenes look at how Dell IT used automation to improve its server decommissioning process.

Server decommissioning is an essential part of the equipment lifecycle process for any organization—and Dell Technologies is no exception. Like many companies, we wanted to find a faster, smarter, more efficient way to do this.

That was the goal as our IT teams took on the challenge of managing the lifecycles of more than 120,000 servers in use across Dell Technologies, to ensure they are efficiently shut down when they need to be.

When the Dell Digital Server Decommissioning Product Team started this effort, Dell Digital, Dell’s IT organization, had a mix of decommissioning processes. We saw an opportunity to standardize processes and apply automation to increase efficiency and decrease decommissioning time.

Decommissioning a server means disconnecting it from the network and scrubbing all the data it contains. This requires involving a host of internal teams including storage, network, security and IT engineering, undertaking an array of manual steps.

If server decommissioning cleanup isn’t fully completed, it can lead to unused servers cluttering data centers and unnecessarily tying up resources such as power and IP addresses. There can also be instances where servers might get turned back on and become “orphan” servers, disassociated from their applications, but still occupying space in our environment.

After working with application owners, developers and multiple IT teams to analyze our server environment, the Server Decommissioning team created an automated, self-service process to improve this crucial task of properly decommissioning our servers. The team designed and implemented the Self-Service Decommissioning tool (SSD) that completes the server decommissioning process end-to-end in one tenth the time of previous manual processes.

In the first six months of the tool’s deployment, we’ve expedited our server decommissioning and saved millions of dollars in resource costs.

Here are some insights on our server decommissioning effort that might help your organization avoid server clutter in your IT environment.

Collaborating with Stakeholders to Hone the Process

The team began our decommissioning analysis by spending several quarters researching records, processes and collaborating with stakeholders to get a picture of our server environment and the existing process.

We found that it took an average of 64 days to perform the manual server decommissioning process spanning across various teams.

In the old days, server owners had to submit a request to get approval to decommission their devices within limited windows of time. Our Change Management Team would review requests weekly to ensure there were no moratoriums or production conflicts and then authorize owners to begin the server shutdown activities. Various factors could lead to requesters waiting weeks or even months before they could start the process.

Decommissioning was a very manual process and involved various teams  to tackle the many tasks, such as reclaiming storage volume, removing backup data associated with the server, deleting associated virtual machines from our VM console, and more. Dell Digital Domain Name Service recycled valuable IP addresses which are always in limited supply.

Building a Self-service Path

Once we determined the existing process, the Server Decommissioning Team began working with IT, Change Management and the Dell Digital Zero Touch Engineering Teams to create an automated decommissioning process. Our ultimate goal was to make it easier and more efficient for teams to retire hardware from our environment.

We adopted a hybrid approach, where decommissioning requests would go through a newly created workflow that automatically checks for conflicts and authorizations. If it found no conflicts, the process then issues decommissioning approval. Our new automated decommission tool (ADT), developed by the Zero Touch Engineering Team, handles the actual decommissioning process, including self-service server shutdown, server status validation and the deletion of VMs.

Under the new process, once a request goes through the auto approval workflow, service shutdown is automatically triggered. Since there are no longer restricted windows of time for decommissioning, a server can now be shut down in as little as 24 hours.

The SSD tracks each request’s progress, keeping requesters informed on the status of their applications. The simplified, seamless and faster process has vastly improved SLAs for our team members and makes server owners accountable for the fate of their devices.

By applying automation, we’ve decommissioned more servers, faster, and rectified the orphan servers matter. It has resulted in reclaiming 53K TB of disk and 364 TB of memory. This has led to millions of dollars in savings.

If your organization is looking to improve its server decommissioning process, my strongest advice is to not be afraid of dealing with the complexity of your existing environment. Do the analysis to determine your ecosystem’s current status. Work with stakeholders and your IT teams to resolve this nagging problem that otherwise will continue to grow.

Creating a self-service, automated server decommissioning process can free up team members to focus on what matters, gain cost savings through resource reclamation and, perhaps most importantly, help decrease cybersecurity vulnerabilities for your company.

Keep up with our Dell Digital strategies and more at Dell Technologies: Our Digital Transformation.

Evelyn Teo

About the Author: Evelyn Teo

Evelyn Teo is the Senior Manager of Performance Engineering and Global Server Decommission product under TES Enablement Journey. She has been with Dell Malaysia for more than 16 years, working in multiple roles, including Application Support, IT Governance Council lead, Account Manager for Infrastructure and Manufacturing segment and Strategist for Infrastructure Engineering. In her current role, her mission is to transform products into self-service model, end-to-end solutions. She is also working to drive adoption and expansion of Infrastructure as Code (IaC) for faster and easier server provisioning deliveries to customer. This is critical for infrastructure migration when hardware moves to end of life.