Cloud Performance Root Cause Analysis at Netflix
At Netflix, improving the performance of our cloud means happier customers and lower costs, and involves root cause analysis of applications, runtimes, operating systems, and hypervisors, in an environment of 150k cloud instances that undergo numerous production changes each week. Apart from the developers who regularly optimize their own code, we also have a dedicated performance team to help with any issue across the cloud, and to build tooling to aid in this analysis. In this session we will summarize the Netflix environment, procedures, and tools we use and build to do root cause analysis on cloud performance issues. The analysis performed may be cloud-wide, using self-service GUIs such as our open source Atlas tool, or focused on individual instances, and use our open source Vector tool, flame graphs, Java debuggers, and tooling that uses Linux perf, ftrace, and bcc/eBPF. You can use these open source tools in the same way to find performance wins in your own environment.
developers, Technical leads and Architects,programmers, testers, business analysts and product owners