Software Engineering at Google Chapter #23 - Continuous Integration (3 of 3)

  • As a release candidate is promoted through environments (dev, stage, prod) more and more tests are run at each step
  • It's important to run tests against the release candidate (RC) because...
    • Sanity check to make sure nothing strange happened when the code was cut and compiled into the RC
    • So the engineer can easily audit the process and not dig into the (usually separate) continual build (CB) logs
    • You can apply cherry pick fixes in the RC but this means it will be different from the CB and thus must be tested again
    • Allows for emergency pushes that can bypass CB testing
  • An organization can run some of the same pre-production tests against the production code after it has been deployed. Google calls these "probers"
  • All of the testing and re-testing is essentially like the security idea of "defense in depth"
  • Both continuous integration and production alerting share the same overall purpose and thus lessons from one can be applied to the other
  • Only create actionable alerts, 100% uptime (in prod and in builds) is very expensive, and so on
  • Additional challenges related to CI include...
    • Presubmit optimizations that balance which tests to run at pre-submit and which run at post-submit
    • Culprit finding. What caused the build to fail?
    • Failure isolation in large systems
    • Resource constraints because tests require resources to run
  • It's hard to always have a green build for end-to-end tests because some components are out of your control
  • When a build breaks consider if it's a release-blocking bug or a non-blocking bug
  • To overcome test instability and flakeyness consider running the same test multiple times. If it repeats 5 times and 1 of those times fails then things are probably ok and just a bit flakey
  • Hermetic tests are run against a self-contained environment and do not hit real production backends or interfaces
  • Hermetic tests allow for greater confidence because if it fails multiple times you know it's a problem with the new code and not with a flakey backend
  • Hermetic tests use fakes, mocks, and stubs to act as substitutes for real backends (see Chapter 13)
  • Google suggests starting with a fully hermetic setup
  • Record / replay (see Chapter 14) is a powerful tool but one must balance between false positives and false negatives
  • Most teams at Google use a combination of hermetic (simulated) and real live backends for their testing
  • Google has "build cops" - people who's job it is to fix a broken build so that the other engineers are not blocked
  • Some build processes can examine their log output and automatically submit bug reports
  • CI is very cost effective. It may seem expensive but fixing bugs after they hit production is much more expensive.



Thank you for your time and attention.
Apply what you've learned here.
Enjoy it all.