Rigorous Performance Testing - How We Got Here

Ten years in the web performance industry taught me one thing: measuring performance is almost as complicated as improving it. Folks are often surprised to hear that — especially when I shout it through a megaphone on stages and on blogs like this one. After all, stakeholders from businesses of all stripes receive regular load time reports in easy-to-consume sound bites. In fact, you have probably seen something like this more than once:

Response Time: 4.987 seconds

That seems simple enough. But, in truth, the figure above is trickier than Miles Per Gallon — and we know how reliable those ratings are. How was the data behind this figure collected? How was this figure calculated? What is a response time, exactly? Did any actual human being get a response in that time? What device were they using? What does that number really mean?

Rigorous Performance Testing on the Web - Grant Ellis, Instart Logic from Instart Logic

Wanted: a measuring stick that tells the truth

There are two fundamental truths to performance testing:

  1. Methodology matters more than results.
  2. Statistics can (and sometimes do) lie.

In fact, it is easy to make great performance results look really poor, it is easy to make poor results look really great, and it is easy to do either of those things deliberately or accidentally.

The data center bottleneck

Tim Berners-Lee is widely credited with the invention of the World Wide Web in 1989, but most of us had little exposure until 1995, 2000, or later. Roughly speaking, the Web became commercially viable in 2000. Right around that time, the challenges of performance were largely in the data center. Hardware was much less powerful, server software was less efficient and less reliable, and there were no tools for achieving performance at scale.

In order to address this problem, companies like Cisco, F5 Networks, and NetScaler (now owned by Citrix) designed hardware appliances (load balancers) to help their customers scale horizontally and meet heightened demand without incurring a performance penalty. Companies like Mercury offered commercial tools like Load Runner (now owned by HP) to generate synthetic load and test performance under strain.

The first-mile bottleneck

F5 and NetScaler were (and still are) excellent tools for achieving scale inside the data center. In fact, they solve the problem so completely that the once-entrenched data center bottleneck was alleviated and there was uncovered a brand-new bottleneck: the “first mile.” With load balancers in the arsenal, companies could scale their infrastructure far beyond what their Internet connections (first-mile connections) could accommodate. This was a short-term problem however; major telecoms like AT&T were able to scale their networks and meet first-mile demand, and data center connections became more robust than ever before.

The middle-mile bottleneck

With the data center and first-mile bottlenecks soundly conquered, it became clear that performance on the Web was dramatically impacted by the mechanics of the Internet itself. Under-provisioned media and switches, fragile peering agreements between networks, inefficient routing, and tremendous traffic growth made the Internet slow and unreliable. Companies could control the performance of their web properties only as far as their front door: then the pitfalls of the Internet would take over. In essence, the bottleneck moved again, from the first mile to the middle mile.

Content Delivery Networks (CDNs) like Akamai, Limelight, Edgecast, and others sprung up to help bring order to the Internet. They provisioned servers on most of the major networks powering the Internet. Their reasoning was sound: if they had servers close to end users (on the “edge” of the Internet) and servers close to data centers (near the first mile), then they had some degree of control over how content was delivered in the middle (the middle mile). Companies opting to install a Content Delivery Network were, in essence, decentralizing their infrastructure, just as their users were decentralized.

Just as the bottleneck moved to the middle mile and companies utilized platforms residing in the middle mile, there was a gap in performance measurement. First-mile testing tools did not take the distributed nature of the Internet into account. A new crop of synthetic performance testing tools like those from Keynote Systems and from Gomez (now owned by Compuware) became available. Their key differentiator in this context was a distributed set of testing nodes. Keynote and Gomez customers could measure the performance of their web services, with their CDN in-line, from any number of cities across the globe.

The new bottleneck: last mile and browser mechanics

Content Delivery Networks were extremely effective at solving performance problems caused by the middle mile of the Internet. Companies of all stripes, especially those engaging in commerce, rapidly adopted CDNs. Simultaneously internet providers continued to improve their networks, replacing copper lines with fiber, placing new lines across the globe and across the Atlantic and Pacific basins, and augmenting peering points between networks. Cloud providers disrupted the traditional data center and encouraged site owners to decentralize their infrastructure and place it within the middle mile with much greater bandwidth available.

Latency replaces bandwidth as the big performance offender

More recently, web design patterns have been changing quite dramatically. Standards-compliant browsers coupled with new technologies like HTML5, CSS3, and modern JavaScript frameworks have enabled ever more robust, immersive, and dynamic sites. These sites are less cacheable than the prior crop, and require more “chattiness” with server infrastructure.

Modern web sites have significantly more page weight (KB or MB loaded) than the prior generation — but, more importantly, they have many more objects that need to be loaded in the browser, and the browser must work hard to process all those objects. It is common for a modern web site to have several hundred embedded objects, and they all must be requested, waited for (e.g. network and server latency), parsed, interpreted, executed, rendered — to say nothing of the huge interdependencies between components, which frequently cause blocking or screen repaints. Browsers have so many objects to download that network latency, not bandwidth, is the biggest performance offender.

Underpowered computing devices and shaky wireless connections

Usage patterns have changed as well: more and more users are on mobile phones or tablets. Those devices have much less power available than a conventional computer, compounding the above challenges. Furthermore, phones, tablets, and laptops alike are frequently connected to the Internet via wireless technologies like 3G, 4G, LTE, or Wi-Fi. All of these connections, even WiFi, have limited bandwidth, high latency, and high packet-loss. In fact, most of the Web’s users are located in urban regions with high population density — so Wi-Fi access points tend to conflict and become irregular and unreliable.

In short, the middle mile is no longer the bottleneck. Last-mile connectivity, wireless technologies, underpowered devices, and browser mechanics are causing the majority of today’s performance problems.

We know there’s a performance problem, because we all experience it. We know where it is — in the Last Mile. How do we measure it? In my next post, we’ll look at how to test performance effectively given the new bottleneck. Check back here for Part 2: Modern Testing Tools.