Software Engineering at Google Chapter #23 - Continuous Integration (3 of 3)

  • As a release candidate is promoted through environments (dev, stage, prod) more and more tests are run at each step
  • It's important to run tests against the release candidate (RC) because...
    • Sanity check to make sure nothing strange happened when the code was cut and compiled into the RC
    • So the engineer can easily audit the process and not dig into the (usually separate) continual build (CB) logs
    • You can apply cherry pick fixes in the RC but this means it will be different from the CB and thus must be tested again
    • Allows for emergency pushes that can bypass CB testing
  • An organization can run some of the same pre-production tests against the production code after it has been deployed. Google calls these "probers"
  • All of the testing and re-testing is essentially like the security idea of "defense in depth"
  • Both continuous integration and production alerting share the same overall purpose and thus lessons from one can be applied to the other
  • Only create actionable alerts, 100% uptime (in prod and in builds) is very expensive, and so on
  • Additional challenges related to CI include...
    • Presubmit optimizations that balance which tests to run at pre-submit and which run at post-submit
    • Culprit finding. What caused the build to fail?
    • Failure isolation in large systems
    • Resource constraints because tests require resources to run
  • It's hard to always have a green build for end-to-end tests because some components are out of your control
  • When a build breaks consider if it's a release-blocking bug or a non-blocking bug
  • To overcome test instability and flakeyness consider running the same test multiple times. If it repeats 5 times and 1 of those times fails then things are probably ok and just a bit flakey
  • Hermetic tests are run against a self-contained environment and do not hit real production backends or interfaces
  • Hermetic tests allow for greater confidence because if it fails multiple times you know it's a problem with the new code and not with a flakey backend
  • Hermetic tests use fakes, mocks, and stubs to act as substitutes for real backends (see Chapter 13)
  • Google suggests starting with a fully hermetic setup
  • Record / replay (see Chapter 14) is a powerful tool but one must balance between false positives and false negatives
  • Most teams at Google use a combination of hermetic (simulated) and real live backends for their testing
  • Google has "build cops" - people who's job it is to fix a broken build so that the other engineers are not blocked
  • Some build processes can examine their log output and automatically submit bug reports
  • CI is very cost effective. It may seem expensive but fixing bugs after they hit production is much more expensive.
< BACK NEXT >
Tweet


   


   

Thank you for your time and attention.
Apply what you've learned here.
Enjoy it all.