Software Engineering At Google
Chapter #1 - What is Software Engineering? (3 of 3)
Software Engineering at Google Chapter #23 - Continuous Integration (3 of 3)
As a release candidate is promoted through environments (dev, stage, prod) more and more tests are run at each step
It's important to run tests against the release candidate (RC) because...
Sanity check to make sure nothing strange happened when the code was cut and compiled into the RC
So the engineer can easily audit the process and not dig into the (usually separate) continual build (CB) logs
You can apply cherry pick fixes in the RC but this means it will be different from the CB and thus must be tested again
Allows for emergency pushes that can bypass CB testing
An organization can run some of the same pre-production tests against the production code after it has been deployed. Google calls these "probers"
All of the testing and re-testing is essentially like the security idea of "defense in depth"
Both continuous integration and production alerting share the same overall purpose and thus lessons from one can be applied to the other
Only create actionable alerts, 100% uptime (in prod and in builds) is very expensive, and so on
Additional challenges related to CI include...
Presubmit optimizations that balance which tests to run at pre-submit and which run at post-submit
Culprit finding. What caused the build to fail?
Failure isolation in large systems
Resource constraints because tests require resources to run
It's hard to always have a green build for end-to-end tests because some components are out of your control
When a build breaks consider if it's a release-blocking bug or a non-blocking bug
To overcome test instability and flakeyness consider running the same test multiple times. If it repeats 5 times and 1 of those times fails then things are probably ok and just a bit flakey
Hermetic tests are run against a self-contained environment and do not hit real production backends or interfaces
Hermetic tests allow for greater confidence because if it fails multiple times you know it's a problem with the new code and not with a flakey backend
Hermetic tests use fakes, mocks, and stubs to act as substitutes for real backends (see Chapter 13)
Google suggests starting with a fully hermetic setup
Record / replay (see Chapter 14) is a powerful tool but one must balance between false positives and false negatives
Most teams at Google use a combination of hermetic (simulated) and real live backends for their testing
Google has "build cops" - people who's job it is to fix a broken build so that the other engineers are not blocked
Some build processes can examine their log output and automatically submit bug reports
CI is very cost effective. It may seem expensive but fixing bugs after they hit production is much more expensive.
Thank you for your time and attention.
Apply what you've learned here.
Enjoy it all.
© 2021 Josh Turgasen
All product names, logos, and trademarks are property of their respective owners