I was at a DevOps Meetup recently where the topic of firefighting versus project work was discussed. Not surprisingly, everyone at my table was struggling with this. At worst, firefighting was the job. At best, firefighting regularly interrupted your “real” job. I wish I could say this conversation was unique to this DevOps Meetup but unfortunately it isn’t. Customer after customer shares the same story about constant pressure to reduce costs and improve performance but are swamped with urgent, often unplanned work leaving little to no time for strategic, important projects that will help achieve flat line growth objectives.
So How Do You Contain The Fire?
Below is short list of strategies I have used in my career to help contain the fire. They are in no particular order. Hopefully, you can adopt one or more as a New Year’s resolution to improve your work situation.
Triage Work
I once had a boss that went into a defect system and deleted all the SEV4 bugs. At first, I thought he was crazy, then I realized he was brilliant. The team was NEVER going to get to the SEV4 bugs. Many where over 200 days old. We wasted time, energy, and focus acknowledging them.
Triaging is an important skill and strategy for both managing large lists of work and for setting the expectation of stakeholders. I recommend creating three categories, namely work will get done, work that might get done (best case), and work that won’t get done.
Grouping
Grouping, or bundling, is a strategy of grouping similar items together and fixing them all at the same time. Basically, you gain efficiency by minimizing task switching and enabling team members to focus on a single problem area. This practice requires a published and searchable list of work that is shared.
Limiting Work-In-Progress (WIP)
This is one of the most effective and underutilized strategies for containing fires and suppressing new ones. In its simplest form, it means don’t start something new until you completed what you are working on. There are very few requests or issues that can’t wait until you either hit a logical stopping point or completed the task. Minimizing task switching is a proven method of improving both productivity and quality. When I start feeling stressed about not making enough progress or overwhelmed by the amount of work to do, I employ this method. It is amazing how a little focus can transform your work output.
This strategy scales very easily. As more people are added, you can adjust your WIP limits. Never have WIP limits bigger than the size of the team. I recommend 70% of team size (round up). The challenge with this approach is discipline. You have to respect and follow your WIP limits even when requests are piling up.
Fix Plus
Fix plus is a firefighting prevention strategy designed to reduce technical debt that is causing fires to reoccur or start. This practice involves analyzing and refactoring upstream and downstream systems and code when resolving an issue. Basically, it means whenever you fix something, analyze the related code and configurations just before and just after the area you are fixing and improve those as part of your fix. This is a proven practice for refactoring fragile legacy systems and for building automated test suites without creating a large project to do it. Besides, it is very rare for a company to fund a refactoring project because it is not perceived to add value.
Staff Augmentation
Bring in partners to temporarily offload firefighting activities so that you can focus on important, strategic projects like building a continuous delivery pipeline and tool chain or developing a self-service portal. Focus efforts that will immediately improve quality, resiliency, and performance of your IT organization, enterprise systems, and application portfolio. The goal is to offset the cost of augmenting the staff with the outcomes and expected return-on-investment from the strategic project. If you are starting a new project add a line item for this extra support to free up SMEs to work on the strategic project. For example when I was leading a large application rewrite for a financial services firm, the client recognized in advance that the demands on their actuarial team to meet business objectives and to support the development team exceeded the capacity of the actuarial team. To prevent multiple failures and delays, the client added temporary headcount to offset the hours our Hedging SME spent supporting the development project. This isn’t a no cost option but it can be a life saver for teams that are really struggling with firefighting activities and heavy workloads.
Visibility
If it isn’t in a report, on a dashboard, or managed by a shared tracking system, it didn’t happen. One of the biggest issues with firefighting is that the work is largely invisible to the organization. What the organization sees is high costs coupled with suboptimal performance instead of all the work you are doing to “keep the lights on”. Improving visibility, IT shops can illustrate how and where cost occurs and use that data to justify future investments and as importantly use that data to help triage requests in alignment with corporate goals. Without shared visibility, it is impossible to make good decisions.
No Heroes
If you read the Phoenix Project, this is referring to resident rock star, Brent. Brent spent nearly every day solving everyone else’s problems or making all the technical decisions because he was that good. As a result, he never got any of his work completed on time and was a major bottleneck for the whole department. Stop recognizing and rewarding Brent for hording knowledge. Instead reward Brent for sharing knowledge and elevating others.
Define Boundaries
Defining boundaries isn’t building a wall to protect rather it is about defining the rules around how you operate and interact with others. This is best implemented at the team or department level and should include details around how new work items are added to queue, how queue is prioritize, and what truly defines a ‘fire’. This does require leadership support particularly during the early stages as the organization adjusts to the new rules. Remember, this isn’t about stopping work from coming in rather it is about making sure that you are working on the most important and most valuable items first. I commonly hear that unplanned work is the biggest enemy. If you combine this strategy with Visibility, Triage, and WIP Limits nearly all unplanned work other than outages are eliminated.
Tomato Timer
This is an individual or team strategy for blocking off quiet time to work. Use a physical timer, set to 30, 60, or 90 minutes, and set it somewhere visible. Then close the door, put up a “Do Not Distrub” sign, put on headphones, etc. This is quiet, focused work time. During this window you do not check email, TXT, IM, phone, etc. and you do not engage anyone that comes looking for you until the timer is completed. You work on items to completion and get stuff done. When the timer finishes, open the door, take down the sign, etc. You are in an open window and can interact freely. Set up multiple blocks like this daily. I had a SCRUM team teach me this one. They were in a shared open space and were finding it difficult to concentrate and solve complex problems. The team implemented this strategy very effectively so that everyone could have quiet time.
I am sure there are many other strategies for time-management and reducing firefighting. I hope some of these are valuable to you. One thing I know for certain is that you can’t even think about transforming your business, using DevOps, until you solve the firefighting problems you are facing in your current IT environment. If you would like to learn more about DevOps, click here.
Good luck and put down that hose!