At Propel, our continuous integration (CI) and continuous delivery (CD) processes enables us to deliver features rapidly, multiple times during a single workday. With each release to production, we’ve found it useful to document
- customer-facing changes contained within the release,
- non-customer-facing changes within the release, and
- our strategy for rolling back the release.
We call this documentation “deployment notes” or “release notes” and we publish these notes on every release to an internal Slack channel. For example, you can see a screenshot of deployment notes for one of our services below:
Why do we do this?
For changes that are not gated behind a feature flag, the deployment notes provide a record of when the changes were actually shipped to production, and for customers, this is what matters. Even with continuous delivery, there will be a little bit of delay between merging to main and releasing to production. So as soon as the deployment notes go out, we know we can reach out to customers and let them know that a new feature or bug fix is available.
Additionally, we often refer to deployment notes when we put together our customer-facing changelog entries for the month. The deployment notes are the record of truth, so we don’t have to ask engineers, “Did feature X ship already?” We just go to our internal Slack channel.
Occasionally, larger changes will require multiple deployments of one or more services. These changes may not be immediately customer-facing; however, we still want to document and communicate where we are in the process, so that there’s never any doubt about the state of production.
For example, Propel’s GraphQL API is actually a GraphQL supergraph combining multiple subgraphs using Apollo Router. In order to extend our API with a new subgraph X, we would
- Deploy a new service for subgraph X.
- Update and deploy Apollo Router to include subgraph X.
The deployment in step 1 contains no customer-facing changes; however, we still write the deployment notes. The deployment notes serve as a record that step 1 was completed, allowing us to continue to step 2.
Externalizing this knowledge means no single engineer has to keep the state of production in their head and everyone can see what was deployed. And that can be really freeing when engineers go on vacation or when collaborating with teammates.
Bugs are a fact of life, and the only way to eliminate them is to stop writing software. But we’re not in the business of not writing software. We’re in the business of shipping features that delight customers! That’s why we invest so heavily in our testing and continuous integration processes, in order to mitigate the likelihood of shipping unintended behavior changes to production.
Despite this, there will be behavior changes, bugs, and even serious incidents that automated testing doesn’t catch. When this happens, it’s the responsibility of an on-call engineer to remediate the issue as quickly as possible in order to restore the customer experience.
Most of the time, pagers go off and incidents are discovered shortly after a deployment, and the quickest path to remediation is a rollback. By “rollback”, we mean rolling back to the previous known working version of the software. This is such a common remediation strategy that we emphasize it in our deployment notes. “The instructions we usually give are Re-deploy the last successful release.“
Of course, there are some rollbacks that could be more complicated than a re-deploy, like a schema change; but, as a rule, we try design all of our changes to be easy to roll back.
Sometimes a bug or behavior change is discovered which does not correspond to a recent deployment. In these cases, accurate deployment notes have been essential for tracking down the source of the change and reconstructing the incident timeline. Having this clarity allows us to fix issues and resolve incidents with confidence. This is a crucial part of our incident response process that we will describe in a future blog post.
How do we do this?
Deployment notes are a core feature of our CI/CD pipelines. Once we merge a change to main, our pipeline deploys the change to our development environment, where it runs our cluster tests. If the cluster tests succeed, it posts to Slack requesting deployment notes and approval for deploying to production.
Our pipelines are currently running as GitHub Actions and using GitHub environments, but they used to run on CircleCI, so you could build a similar solution on any provider.
Curation vs. Automation
There are many tools that can automate the creation of changelogs and release notes. For example, we follow the Conventional Commits specification, which means we prefix our commit messages with “fix” or “feat” to indicate a bug fix or new feature, and we include “BREAKING CHANGE” if we need to ship a breaking change.
To be clear, we consider Conventional Commits a best practice! We even automatically bump version numbers according to the commit history. But we’ve found the automated changelogs derived from Conventional Commits to be too low-level for use in deployment notes. And when it’s handled automatically, we’re not necessarily “thinking” about the changes going out.
That’s why, on every release, we have the engineer in charge of the release pause and write the deployment notes. This allows the engineer to communicate at a higher level the changes going out and to review once more the changes being released.
We always follow the same Markdown template, which you are free to re-use for yourself. If one of the sections is not applicable for a particular deployment, we write “N/A”.
Deployment notes are an effective tool for communicating customer-facing and non-customer-facing changes. They’re also useful for incident response, where the on-call engineer needs to know how to rollback changes. Although tools exist to automate the creation of changelogs, we’ve found it best to explicitly author our deployment notes and require human approval before releasing them to production.