Software Engineering at Google Chapter #25 - Compute as a Service (1 of 3)

  • Cattle not pets and a flexible scheduling architecture have achieved great success at Google
  • The "automation of toil" is what drove Google and others to create automated compute environments
  • Several pages talk about the evolution of this automation using older methods (manually spinning up VMs, using shell scripts to push the binaries, manually fixing VMs, etc)
  • These solutions do not scale, require a great deal of human involvement, and are superseded by more modern approaches such as containers and Kubernetes / Mesos schedulers
  • Containers create a sandboxed environment for things to run in
  • You can set memory and CPU limits on containers and multiple containers can live on the same virtual machine
  • This allows for denser "bin packing" and more efficient use of compute resources
  • In a non-containerized multi-tenant setup you will run into contention issues as the size of the applications grow
  • These issues include swapping due to increased RAM usage and increased latency due to CPU contention
  • As such, you need an isolation mechanism such as containers
  • Virtual machines (VMs) also work for tenant isolation but they are slow to spin up and create a great deal of overhead
  • Also consider how limits on some resources can impose limitations (max file handles, number of PIDs, number of threads, etc)
  • When it comes to rightsizing, letting the user choose and automatically choosing are both tricky to get right
  • Your organization will grow on the following 3 axises...
    • Size of the largest application
    • Number of copies of an application that need to be run
    • Number of different and unique applications that need to be run at the time



Thank you for your time and attention.
Apply what you've learned here.
Enjoy it all.