Writing Engineering Guidelines
What they are, why you should have them, and how to go about making your own.
Engineering Guidelines are a collection of your an organizations’ Best Practices; a distillation of the institutional knowledge around “how things should be done here”. They are a cross between a Mission Statement, Company Values, and an Employee Handbook for your engineering department.
Does your company have engineering guidelines? It should.
Consider the original promise of the 12-factor app: “If you write your code in this way, you get all these nice things.” Engineering guidelines are the same, but for your organization. It should explain both the overarching guiding principles, but also the nitty-gritty details of how to actually do things.
You need to lay out standards for
- The code (formatting, style, tests)
- The development process (CI, code review, deployments)
- The service (flags, configs, metrics, logging)
(This blog post will be focusing primarily on the technical aspects. There should also be thought put into writing down the *cultural* guidelines for your engineering department, but those will be hand-waved over for now. Similarly on the design side, user-visible aspects will need to have consistent UI and UX. Those are beyond the scope of this article.)
You need a rationale so people understand the context in which these decisions have been made. This allows exceptions when these base assumptions do not hold, or updating the guidelines when the larger context changes. It’s about making decisions once for consistency. It’s about avoiding known issues or edge cases. It’s about choosing a specific technique with known tradeoffs for dealing with problems.
This is the role guidelines play. The goal is consistency. You may not agree with a particular guideline, but having *something* means relieving cognitive burden from both writers and readers.
Consistency is important once there are multiple developers working on a code base. It keeps everybody on the same page. As new developers join, all the projects they look at follow similar rules. SREs who are paged in the middle of the night have an idea what they’re up against.
More things work as expected with fewer surprises.
At smaller organizations, guidelines manage the freedom from less corporate inertia around how to do things. This freedom can easily lead to a much larger variety of practices which can be difficult to rein in later.
Your guidelines don’t exist in isolation. They frequently need institutional support to be effective. For example, if the guidelines state the recommended way to set up automated CI builds requires multiple commits to different repositories with different teams needing to +1 the changes *and* multiple coordinated rollouts, then few projects are going to bother to properly set up CI for their projects. You need to make it easy to do the right thing.
This is what most people think of when they think of guidelines: a document that specifies four-space indents, Allman style braces and that every function have a 10-line Javadoc listing inputs and outputs.
But beyond that, this section needs to describe not only the “allowed languages”, but how they should be written at your company. Note that this might be at odds with how the chosen languages are written elsewhere. For example, Google famously does not use C++ exceptions. Note that If you plan on deviating substantially from the existing cultural norms for your chosen languages, it’s worth reading the Goals section of Google’s guidelines and similarly Titus Winters’ CppCon talk on “The Philosphy of Google’s C++ code”.
Documenting and/or linking to existing guides also helps people working with this language for the first time in your organization.
This quote from Rob Pike’s “Go Proverbs” sums it up: “Gofmt’s style is no one’s favorite, yet gofmt is everyone’s favorite.” The exact style is less important than the fact that there is consistency. And if a style decision is hard to make, it means both options are fine and just flip a coin.
The Development Process.
Modern software development has lots of steps between writing code in your editor and having it successfully merged into HEAD. This section will have policies in place for each of them.
Some suggested topics:
- commit message format
- how and where code review takes place
- setting up CI
- compiler flags (optimization levels, debug info, warnings)
- expectations for tests, including coverage and policies for unit/integration/end-to-end tests.
- which static analysis tooling to run
- when and how code generation tooling is run and kept updated
- branch usage and naming
- squash/merge and multi-commits vs. rebase/force-push during code review iterations
- issue trackers need size estimates (S/M/L/XL), bug priorities, and workflow documented
- for deployments: staging, canaries, incremental deployments, rollbacks
- documents outlining incident-handling and post-mortems,
A number of these items cover mostly vocabulary and definitions so that everybody is talking the same language.
This will probably be the largest section of your guidelines. It covers a wide range of topics covering how the code you write interacts with the rest of the environment. If the previous section covered *how* you code, this section covers *what* you code.
How does your service integrate itself into the rest of the existing infrastructure?
From the moment your service starts up, it’s going to need to be need information to configure itself. Standard configuration file locations and formats, as well and guides for naming command line flags are useful. Does your service need to reload its configuration in-place while still running, or a graceful restart while not dropping network connections? Should you aim for graceful-shutdowns or crash-only with recovery on startup?
Once your service is up and running, how does it communicate with other services? Does it need to contact a directory service? What about service authentication? Does it speak JSON over HTTP(S) or something else? What should the retry strategy be? How does it handle API versioning? And all of these need to be answered for the service as a client and a server.
Finally, how do you report to the rest of the world what’s happening? At the bare minimum, what are the policies around logging and metrics? For logs, what should be logged and where does it go? Are the logs structured for free-form? If using structured logs, what are the naming guidelines for common fields? Do you have different logging verbosity levels? If so, what are they and what should be logged at each level? Metrics again need a common destination and consistent guidelines around what to track and what to name them. How are distributed traces collected?
This section can also list recommended libraries for different tasks and banned libraries, each with their own justification sections.
Writing Your Own Guidelines
As the first internal advocate for Go at a large technology company, I took on the task of coming up with guidelines for our Go code. While we already had engineering guidelines, they were focused on the Perl side of things. Part of the guidelines I wrote were to a make rules for my playground. Part of it was an onboarding guide to Go. Part was how to integrate Go into the existing Perl infrastructure.
As this was all basically greenfield and I was on my own at the start. Part of it clarifying what development was like on the Go side of things (also for advocacy purposes — “Come play with our nice tools!”), but at the same time making clear my expectations for code people commited. This mostly covered differences from the development processes that the company had deemed acceptable for the multi-million line Perl codebase.
I spent a lot of time looking for best practices from similar organizations, with rationales if possible so that I could see if their goals matched mine. For example, people often point to Google engineering practices and try to imitate them and apply them to their own organization. But lots of practices that make sense at Google do not make sense in other organizations. You can not just absorb another company’s guidelines wholesale without taking their assumptions about their culture and associated infrastructure. (I might explore this more in another blog post I’m tentatively calling “The parable of glog”.)
I also spent time thinking about how I wanted my GOPATH to look and feel. I could look at what wasn’t working for me on the Perl side things, and write down a reasonable policy that made sense within the culture I wanted to foster.
The document I ended up writing was mostly a FAQ for developers new to Go, with specific attention paid to devs moving from Perl, as well as lots of notes on how to integrate into the existing infrastructure. In an effort to keep some parity with the existing Perl guidelines, I did end up implementing a number of packages that matched base libraries on the Perl side so make the transition easier. For example, being able to load existing standard config files and have some APIs familiar to people familiar with the Perl code base.
My guidelines were also a living document, not written in stone. Every time something happened that could have been caught sooner, it was added as something to watch out for.
Before I left, I added a final section to the guidelines document: a list of code janitor tasks I handled that would need to be picked up. “If everybody is responsible, then nobody is responsible”
How do you eat an elephant? One bite at a time.
Having guidelines is all good for greenfield development: As new code arrives, you ensure it conforms to the current known Best Practices. But what about applying these to an existing legacy codebase? The short answer is “It’s hard”. In the short term, document the known exceptions.
Decide if it’s worth code churn to an existing system. Some things, like updating the formatting, can be just done once at the cost of making git archeology a bit trickier. Fixing issues found by static analysis tooling are more suited to be worked on slowly.
Updating interactions with the environment are harder to justify. They’re no longer self-contained fixes, but have ripple on effects to either consumers or clients. Patching “working code” for no externally visible gain can be a hard sell, and may not always be the right choice. If the system is old, you may be able to default on the technical debt if a replacement is in the works.
Like this article? Buy me a coffee.