Web service performance testing - tips and tools for getting started : Assertible

10/2/2017 Featured Cody Reichert

Performance testing is an important aspect of running an API or web application, but where do you start? This post will go over the basics of performance testing, present best practices, and provide tools to get started.

Performance is more than just response times, and the implications of bad performance is more than just a slow application. Testing and monitoring your web application's performance is an effective way to find bottlenecks that can cause unsatisfied (or abandoned) users, lost money on over-used server resources, and bugs that are hard to track down.

With a solid performance testing framework in place, it's possible to get a head start on providing a consistently high quality service. But where do you start? In this post, I'll illustrate common goals for testing API and website performance, outline tips for creating your tests, and discuss what data points to measure:

Types of performance testing
Determining important metrics for your API
Creating and running performance tests
Automated performance monitoring
Performance testing resources and tools

Types of performance testing

There are many different types of performance testing to be aware of. Though, as with all types of QA, creativity and an understanding of your domain will go a long way in setting up a valuable performance testing pipeline.

Different applications and business domains may require very different tests and metrics than other applications and domains, so it's important to be aware of what tests will provide you with the best data.

The Microsoft Developer Network has a thorough breakdown on the different types of performance testing, which you should read for a more comprehensive list of advantages and disadvantages. Here's what they say on the importance of this knowledge:

It is important to understand the different performance test types in order to reduce risks, minimize cost, and know when to apply the appropriate test over the course of a given performance-testing project.

[MSDN](https://msdn.microsoft.com) in [Performance Testing Guidance for Web Applications](https://msdn.microsoft.com/en-us/library/bb924357.aspx)

These are the three most common types of tests you'll hear in the software world. The fourth, though, is perhaps the most useful:

Load testing
Stress testing
Endurance testing
Custom test scenarios

Load testing

Load testing is used to determine how an application performs based on a certain volume of users. Generally, load tests will increase the volume of requests in the duration of the tests, but load tests can be used to gather performance data from any given load, small or large.

Stress testing

Stress testing is very similar to load testing, except they are specifically designed to test the system's performance with the maximum, or over maximum, capacity of requests and jobs. In other words, stress testing determines at what point a system breaks down, and verifies that the system continues to function under maximum capicty.

Endurance testing

Endurance testing (also known as Soak Testing) is a way to test your application with a typical production load over a prolonged period of time to determine the behavior, the endurance, of a system during normal use.

Custom test scenarios

The three types of tests mentioned above do not encapsulate everything performance. Software testing is as much an art as it is a science. Creating automated scenarios that mimic real-user behavior and gather data points like response time and latency is another great way to measure how real users are effected under various conditions.

Knowing the various types of performance testing is important, but they are effectively worthless if you are not gathering actionable data. It hardly matters how much load your system can handle if you do not have a baseline for what it should handle.

In the next section, I'll discuss some common metrics and how to find ones important to your application.

Determining important metrics for your API

One of the most critical aspects of performance testing (all testing, really) is finding the metrics that are important and actionable for your organization. In addition, you need to determine what thresholds are considered passing, failing, and critical, and create tests that are consistent and reliable in measuring these metrics. Not least, you need a way for your team and stakeholders to review and act on these results.

This part needs a lot of thought. Every web service has a story. It has context that needs to be considered when reviewing metrics. Ali Hill, an automation engineer, states the importance of this very succinctly:

[..] As a beginner to performance testing, the stats I could be providing the stakeholders had the potential to be misleading

[Ali Hill](https://edinburghtester.wordpress.com) in [How Ministry of Testing Started My Performance Testing Journey](https://dojo.ministryoftesting.com/lessons/how-ministry-of-testing-started-my-performance-testing-journey)

Measuring and tracking the wrong metrics in your performance tests can lead to misleading information, wasted time, and confused stakeholders. The data that's gathered by performance tests can be extremely important and valuable when it leads to new insight on bugs and bottlenecks in your application.

I read a lot of articles and papers to find the most common metrics people gather and there are some good baseline data points you can track. Some of the ones I see mentioned most are:

Average response time
Peak response time
Error rates
CPU utilization
Memory utilization

Average response time

Average response time is a good metric to start gathering data on in your performance tests. Tracking this allows you see how your app fluctuates with more or less load. It gives you an idea of the average user experience, and it provides insight on regressions if something changes.

Also important, is to know your ideal baseline response time. In other words, at what point should slowness be investigated or considered critical? Set up alerts and dashboards so the whole team can see when something changes.

Peak response time

Peak response time allows you to see the performance of slowest requests, generally by taking the 90th percentile of all response times. This creates a different view than an overall average. With peak response times, you can find more specific queries that may be problematic and know what the worst users are experiencing.

Having a view into the slowest queries can make it easier to track down specific operations with latency, whereas the average response time is more general, and only gives you an idea of the entire system.

Error rates

If errors occur, you'll want to know about them. Simply measuring a ratio of total requests by failed requests -- and failed requests can be anything that match a criteria you define -- gives you insight on what components start failing under a given load.

Generally speaking, when there's an issue in your application (whether that be high load or a bug), it may not fail all requests completely. An error rate provides a better idea of when things start failing. However, you may also want to get more fine-grained in the data you capture with an error. For example, does /login fail when a high number of users are logging in at the same time.

CPU utilization

It's expected that CPU utilization will go up with more users, and go down with less, but do you know what happens when your CPU hits 100%? It's bound to happen. Being able to act preventatively under these conditions can help you build in redudancy and auto-scaling systems to keep your app available.

Memory utilization

Measuring memory usage during performance tests can capture the amount of memory used by the server when processing requests during different scenarios, like heavy load. This metric in particular is a good way to find memory leaks in your application, and it some cases can be coorelated with peak response times to determine slow database queries.

Creating and running performance tests

When you've identified a few important metrics and are ready to start creating and running performance tests, your plan should include determining when and where to run the tests.

Test performance constantly and continuously
Run tests in a production-like environment

Test performance constantly and continuously

With the trend of continuous integration and delivery, unit and other types of functional tests are run as frequently as possible throughout the development and release cycle. Likewise, performance testing should be done as early and frequently in the process as as possible.

Some types of tests, like stress tests or other long-running test jobs, may be deferred to run near the end of a release cycle, but again: the earlier and more frequently tests can be run the better. Finding and resolving bottlenecks early in the development cycle saves developer time and hours.

Our friend Noga Cohen from Blazemeter wrote a great post on performance testing challenges in CI, and here's what she has to say about long-running tests:

Make short tests - 1-5 minute load tests can reveal many insights and show trends and changes. Don’t put spike and endurance tests in CI. You can use Jenkins as an automation platform, but don’t put them in the CI cycle for every build.

[Noga Cohen](https://twitter.com/NogaCohen7) in [Overcoming performance testing challenges in continuous delivery pipelines](https://www.blazemeter.com/blog/how-overcome-challenges-when-performance-testing-continuous-delivery-pipelines)

Next, let's discuss the environment in which performance tests are run.

Run tests in a production-like environment

The results of performance tests are rendered useless if they're not run in an environment that closely resembles, or is an exact replica of, the production environment. If your tests are run on a machine with half the CPU and memory of your production server, the results won't tell you anything about the actual performance of your production application.

Stackify has a great introduction to performance testing, and this sums up their thoughts on a proper testing environment nicely:

Conducting performance testing in a test environment that is similar to the production environment is a performance testing best practice for a reason. The differences between the elements can significantly affect system performance. It may not be possible to conduct performance testing in the exact production environment, but try to match hardware components, operating system and settings, applications used on the system, and databases.

[Stackify](https://stackify.com) in [Ultimate Guide to Performance Testing and Software Testing](https://stackify.com/ultimate-guide-performance-testing-and-software-testing/)

All of the details here matter. A slightly different amount of memory or size CPU in a testing environment can skew results so much that they don't reflect how production will behave. Spending a little more time on this part to do the best you can is worth it in the long run.

Creating a testing environment that closely resembles production also benefits all other types of testing and QA and provides better, more actionable, results.

Automated performance monitoring

Earlier I mentioned staying away from long-running tests as part of your CI pipeline. That leaves the question of when to actually run those tests. My answer to that is two-fold:

Automated performance testing in a QA environment
Performance monitoring in production

A thorough process will include automation for regularly scheduled jobs that execute your performance tests in a qa or testing environment, in addition to having basic production monitoring to measure performance.

Automated performance testing in a QA environment

Without compromising CI wait time, you should try to run as many performance tests within your CI/CD pipeline as possible. This way, you can find and resolve issues quickly without needing to wait on performance tests to be run at a specific time. However, CI builds should be fast, so don't make the whole team wait on 30 minute performance tests after every commit.

For the tests that are too long or resource intensive to run in CI, the best practice is to have a dedicated job that runs on a schedule. This may be through Jenkins, or a cron script, but in any case, the execution and reporting should be as automated as possible.

Performance monitoring in production

Testing in production is becoming more and more common. The benefits are clear, and the same rules apply to performance testing. It goes without saying that you don't want to stress test your production environment while real users are active. But there are many other things you can monitor and measure in production that are the precursor to bigger issues and errors.

For a basic example, we run 1 minute, 5 minute, and hourly tests on the API at Assertible. Primarily we measure response times and error rates, but in addition we watch these metrics:

CPU utilization
Memory utilzation
Average response times
Error rates
Database latency
Network throughput

Knowing the average performance of production is a way to find baseline goals for your tests. Having an idea of what the average response time is during a certain scenario let's you set thresholds on your performance tests accordingly, so they're not arbitrary and you are able to effectively measure regressions.

Conclusion

Performance testing can be hugely beneficial when it's implemented correctly. It will give you data and insight you can't get through normal, functional testing, and can help you save money and, more important, users.

In this post, I've gone over a few things to consider when embarking on performance testing, but there's no better experience than just getting started. I collected a few resources while putting together this post:

Resources and reading

Tools and frameworks

Apache JMeter: An open source Java application designed to load test function behavior and measure peformance.
HPE LoadRunner: A tool used to test application and measure system performance and behavior under load.
StormRunner Load: A cloud-based load testing tool to design and create web and mobile performance tests.
Locust: Run load tests distributed over multiple machines and simulate millions of users.
ApacheBench: A benchmarking tool designed to give you an impression of how many requests per second your Apache installation is capable of serving.
WebLoad: A tool for load, performance, and stress testing web application for web and mobile.
WAPT Pro: This tool allows you to test the performance of web applications under load, monitor server and database performance, and much more, all through a convenient user interface.

Did I miss one you think should be mentioned? Let me know!

Hopefully this post gives you a few things to think about as you start to set up performance testing for your application. Do you have any tips or best practices I should add to the article? Or do you just want to share your thoughts or experience with performance testing? Find me on Twitter or reach out any time!

:: Cody Reichert

Web service performance testing - tips and tools for getting started