Software Engineering at Google Chapter #14 - Larger Testing (3 of 3)

  • A/B diff regression testing:
    • Two environments, data is sampled or multiplexed from production
    • The most common form of large testing at Google
    • Sends the same traffic to both "side" (A side and the B side) to compare for unexpected behavior
    • ALL differing behavior must be understood and reconciled
    • A/A testing can be used to determine non-deterministic behavior, flakiness, noise, etc
    • A/B/C testing is a thing (if you need it)
    • Limitation: A human must reconcile all differences (are they ok or what do we do about them?
    • Limitation: More noise means more investigation of the results
    • The setup of such environments is complex
      • Unit tests verify that code is "working as implemented" while User Acceptance Tests (UATs) verify the code is "working as intended"
      • Probers and canary analysis:
        • Probers are functional tests that run assertions against the production environment
        • Probers are mostly read-only actions, beware writes
        • They act as smoke tests that find early prod issues
        • Canary analysis is pushing a new deploy to a subset of production in order to observe it before a full rollout
  • Chaos engineering and disaster recovery drills can be a part of testing
  • User evaluation:
    • Dogfooding: Rolling out changes to a limited subset of users and/or internal users
    • Experimentation: Make something available to a subset of users without them knowing
    • Rater evaluation: A human determines something is good/bad/neutral or which is better and why
  • Integrate large tests into the developer workflow, don't make it an afterthought
  • Some tests add too much friction to be run pre-submit
  • A/B tests are popular because they are relatively easy and has a low human cost at the verification step
  • Don't have tests sleep but instead poll for state changes, use an event handler, or subscribe to a notification system
  • Hard coded timeouts to wait for system setup (VMs to be provisioned, etc) should be easily visible and user configurable
  • Minimize flakeyness by minimizing the scope of the test
  • If your system is designed to fail in a certain way for users, be sure to test that too
  • Test failure messages should be clear and should minimize mental effort needed to determine root cause
  • Large tests need documented owners or they will rot



Thank you for your time and attention.
Apply what you've learned here.
Enjoy it all.