Most people in tech have had one: a launch that’s gone awry. It’s 2am and you’re still trying to figure out why your file syncing is no longer working across the load-balancer, SSL is not being enforced, or you’re getting a multitude of stack traces that make nearly no sense. Any number of things have gone wrong, and you’re scrambling behind the scenes to make it right. You might push a hotfix, you might rollback. You might tableflip and disappear into the night.
Production deployments are stressful for the team. Especially so, depending on the magnitude of the changes rolling out and the level of risk should the site or product go down completely (albeit temporarily). There is nothing quite as panic inducing as a broken production push (or failed build notification) with both the project managers and clients spamming you across all communication channels asking for updates and resolutions while you’re trying to debug.
So what can you do? Like most stressful things in life, the more you can prepare, the better. Take the time to actually make a deployment plan, schedule it, communicate it, and automate it where possible. If you aren’t directly involved in the deployment itself, support the team — limit interruptions (especially if things do go wrong) and take on anything small for them that can help them keep their focus.
This seems obvious doesn’t it? In reality it seems to happen far too often that we deploy without getting everyone informed and onboard. I think it’s important to ask the following questions to yourself or whomever is planning the deployment:
1. Are we deploying because of a timeline or because it’s actually done?
This is a tough one to manage. If you are running behind schedule, hopefully you’ve seen this coming for a while now and have either course corrected or set new expectations along the way. Ultimately, it’s the client’s choice to delay a launch, but it’s up to your due diligence to inform them of all the risks based on what you know about the project. Again, the earlier the better. If the client knows months out that the timeline is not realistic, they are more likely to scale back what features they actually need.
If you’re trying to squish in three sprints worth of work into the last few nights, you are definitely going to be introducing bugs and defects, regressions, and poorly tested features. Lacking a few features in an otherwise well-built system is better than producing a bunch of poor quality last-minute features just to call it code-complete.
That being said, say you choose to do a partial production push by cherry picking or merging specific feature branches only. Typically it’s not ideal to try to cherry-pick commits to move through from a test environment to production, so unless you are good at managing your code with branches and tags, partial pushes can be risky. Make sure you know what you are pushing, especially if trying to take a piecemeal approach.
2. Does everyone on the team know (and can be available for) when the deployment is happening?
This is where we get into the classic office space scenario.
I cannot stress it enough — make sure the team knows ahead of time, and not just hours or days, but weeks ahead, of when the deployment is planned. People do have lives outside of the office (and hopefully you are encouraging that), so springing a deployment, even a minor one, can cause extra stress to the team. Otherwise, plans suddenly need to shift and you’ve already set the team of edge. Bring it up every planning meeting.
Even better, if you can set up consistent dates for deployment, everyone will always know when a deployment is happening. Internally, we’ve set up a calendar for all production deployments. Though there are projects that don’t follow this pattern, this largely mitigates that initial risk. A consistent deployment date ensures everyone is on board, and we can additionally avoid having multiple major deployments on a single day.
3. Does the team and client know what to expect in terms of the deployment process?
What does a deployment actually entail? Do you have a set of release notes for your client? Is there any downtime expected? Do you need a content freeze before pushing code live? These are all important considerations and something everyone involved should be able to answer — client included.
Having a checklist or a simple process outlined can both ensure everyone is on the same page as to what is happening and also a simple reminder for those doing the deployment. Take the time, plan ahead.
4. Have we automated everything we can?
With the modern era, most people are using some form of continuous integration to manage their deployments across multiple environments. This has some obvious benefits: reduce the user error, reduce the regressions with tests, ensure a solid build before being pushed live, so on and so forth.
Internally we use Jenkins and Git webhooks on branches to push code across many environments; local development, development, user acceptance testing (UAT), and production. This allows us to reduce user error, and closely manage what each environment has, as well as monitor what is being pushed to production.
However, with the use of content management systems for large organizations, a lot of configuration is stored at the database level (looking at you, Drupal 7), and some form of manual button pressing, changing, or manual testing is still required in a lot of cases. For the initial production push it’s easy to just clone the UAT environment, and we can (for the most part) know that the environment will work the same. On subsequent pushes it’s not as easy to do so without overriding actual content changes. In these scenarios, it’s important to document the changes required and also the changes that will break everything. However, wherever you are able, try to automate as much as possible!
5. How can I best support the team?
Sometimes, no matter how we prep, things still do go wrong. So lastly, and simply: support your team.
As a project manager, you need to be the face to the client when things go wrong. It’s never a nice place to be when the sky is falling and you are always the scapegoat; but even so, when the team is heads-down debugging, try to limit the interruptions as much as possible. If you are reporting a new bug to them every 5 minutes, it becomes hectic knowing what the developers should focus on, or if priorities are shifting. Work with the client or product owner to prioritize the issues happening and then bring that to the developers.
As another team member that’s not directly involved, know that moral support can go a long way. Take on smaller tasks originally assigned to the team where possible: brew another pot of coffee, listen to the ranting. Whatever you can do to without causing distractions or interruptions is the ideal when a deployment has gone wrong. Heck, if you’re able to jump on some of the problems without taking the team’s time away to set-up, go for it!
Allinall, sometimes even if you’ve done all the prep you can, we inevitably are just in for a long night.