Testing the Limits of a Transactional Networked Service


One of the defining characteristics of a cloud service is scale, and with scale comes the question of performance and cost. How efficient are the software systems that we run? How many computing resources are required to meet our current demands, and how much more will be required in the future?

At Instart Logic, we have created a system called Lava that enables us to measure and test the scalability limits of our systems. Lava is focused on transactional networked services and systems that serve independent requests sent over a network from a large number of clients. Examples include HTTP frontends, data caches and API endpoints.

Performance measurement is a deep topic with many facets. Lava seeks to solve a specific slice of the performance measurement problem: how can we quickly find the maximum load a service can handle? While today there are many open source tools for stress testing, we found most of them to be too inflexible and slow to use for this purpose. This poses a problem as we have a large space of experimental parameters to explore during system stress tests.

Lava decomposes this problem into two pieces: 

  • a set of extensible protocol-specific agents that generate a controllable amount of load on the system in test
  • a control function that uses feedback from metrics generated by the stress test to find system limits.

While the ideas used in the Lava system are not novel, we feel that the particular combination of features used will be interesting to a broader audience.


The most important metrics for our Lava use cases are throughput and latency. Throughput is the rate of request that can be processed and latency is the time from the start of a request to reception of the response.

Figure 1 graph

Figure 1 is a graph of the typical response time behavior with respect to increasing request volume. Service response time is stable under increasing request rate until we reach a ”saturation point” at which the service cannot keep up with the request ingress rate. Beyond the saturation point, internal queues overflow and service response times degrade past acceptable thresholds.

It is important to know what the saturation point is for each of our services. In development, we use the results obtained from Lava to find performance regressions and guide our performance improvement efforts. In production, we use these results for capacity planning, as services need to be provisioned with enough headroom to absorb service failures and request spikes.

There are many existing performance frameworks for network protocols such as HTTP, the main ones being Tsung, Apache Bench, Siege and JMeter. We encountered the following issues with these frameworks:

  • First, many of the frameworks available run a set workload without any feedback mechanisms for load control. Our stress runs can be sensitive around the saturation point and slightly too much load can cause high variance in the output, leading to unstable results.
  • Lack of feedback also meant that finding the saturation point required many runs of the stress tools probing at different load levels. Even with a guided binary search, this proved to be too slow to be viable for exploring large sets of experimental parameters.
  • Finally, while this is not fundamental, we found that the Lava system was simple enough that implementation of the mechanisms within our own framework did not incur undue engineering cost.


lava system figure 2 infographic

The Lava system (Figure 2) consists of two main components:

  • A set of agents running on worker threads that generate application-specific loads. For example, in a stress test of an HTTP frontend, each state machine executes a sequence of HTTP request/response interactions. For saturation point measurement, each agent generates a constant number of requests per second for easy load control.
  • A control function component that receives real-time metrics aggregated from the state machines and adjusts the parameters of the stress run. The control function manages the number of state machines that are active and the state of the Lava system overall.

Each Lava run consists of three phases: ramp-up, search and measurement. During the ramp-up phase, the Lava control function steadily increases the number of active agents until a metric threshold has been exceeded. The ramp-up phase is not strictly necessary; however, we have found it is useful to distinguish for debugging purposes. Lava then transitions to the search phase, in which the number of agents is varied up and down around the saturation point, to find the maximum load possible that still meets the threshold. When the search phase has stabilized, Lava transitions to the measurement phase, in which the number of agents is held constant for a configurable time period. During the measurement phase, all metrics should be stable. If high variance occurs, it is an indication that either something is wrong with the system under test or with the test setup itself. Figure 3 shows the agent count and metric graphs for each of the phases.

Each agent in Lava simulates a constant rate workload from a client. By increasing or decreasing the number of active agents, Lava can adjust the amount of load placed on the system in test. Each agent has (modulo code transformations to facilitate non-blocking I/O) the following inner loop:

void Agent::run() {
  while (true) {
    Operation* op = create_next_operation();

Agents can be implemented as extensions in C++ or via the Lua scripting language. In addition to the system limit exploration, we have also implemented agents that replay request traces taken from production.

Metrics and Control Functions

We track an extensible set of metrics from the active agents and feed them to a control function that determines how to adjust the load. Metrics are tracked by each agent and aggregated by the central control function component.

class Controller {
  virtual Signal update(
    const Metrics* metrics) = 0;

For most applications, we have found that a simple linear controller tracking a moving window of 95th/99th percentile operation latency suffices:

Controller::Signal LinearController::update(
    const Metrics* metrics) {
  double delta = metrics->p95_latency() - limit;
  if (delta > epsilon) { return DECREASE; }
  if (delta < -epsilon) { return INCREASE; }
  return STABLE;

More sophisticated control functions with faster convergence are possible but currently not explored.


Figure 4 lava run test

Figure 4 shows a sample result from a Lava run testing an HTTP-based service. In this graph, we set a threshold for the 95th percentile latency of 2 milliseconds with the linear control function. The top graph shows the throughput we are getting from the system. The middle graph shows the sliding window metrics we are measuring. Note that the metrics can vary due to inherent system variabilities and randomness. The bottom graph shows the number of agents that are active through the run. We can see Lava transition through the ramp-up, search and measurement states from the agent graph.


Lava is currently being used to stress test all major systems at Instart Logic, replacing all 3rd party stress test frameworks. Adoption of Lava has reduced the length of time taken for a single stress test experiment by an order of magnitude. For example, our HTTP-based stress tests using Tsung and binary search took around twenty minutes to converge. A similar run using Lava can converge in under five minutes. 

We are in the process of open-sourcing our Lava software as we feel the feedback-control-based stress test framework is widely applicable and useful. 

To read additional technical content from the Instart Logic engineering team, visit our technology blog.