Mobile platforms (viz., iOS and Android) have created a new set of multi-billion dollar valuations, like Uber and WhatsApp. Similarly, the public cloud has created an ecosystem helping to solve new problems. We are seeing new companies thriving in this ecosystem.
In the public cloud world, performance tuning is “THE” biggest area to focus on. Cloud business meters “time” as a key metric for revenue generation. System downtime or slow performance can be accurately measured to a ‘second’ level granularity and come up with an objective revenue impact.
With that preamble, recently I happen to read the book “Systems Performance: Enterprise and the Cloud, Second Edition (2020)” by Brendan D. Gregg.
In short, An excellent book and must-read for techies. I am picking a small piece that is interesting to me.
Brendon Talks about two high-level strategies in approaching performance tuning issues.
1). Resource Analysis (Bottom-Up Approach)
2). Workload Analysis (Top-Down Approach)
Resource analysis starts with investigating the fundamental building blocks of the system. Brendan proposes the USE(Utilization, Saturation, and Errors) methodology to iterate the performance problem by monitoring the specific resource and iterate them through controlled change. You start with Errors and Saturation then finally Utilization of the resource of interest.
An example of resources could be,
CPUs
Memory
Network Interfaces
Storage
Workload Analysis takes the reverse approach where we start the investigation from the Application level. This involves identifying the workload and its typical attributes. Brendan calls it “Thread State Analysis”.
Key metrics in TSA analysis are,
Latency
Throughput
The primary monitoring entity here is ‘Threads’ as opposed to resources in the USE Method. The Thread is a “runnable entity” in the operating system. In most OS, Thread will have 6 states, analysing where the ‘threads’ are spending the most amount of time would help to understand the bottleneck.
The thread states are,
Executing (Latency State)
Runnable (Latency State)
Paging (Latency State)
Sleeping
Lock
Idle
The above two methodologies give a good structure to our thought process of performance tuning. The book also details more on using specific tools to get more insights about the system behavior.