New to API monitoring? Here are 5 tips to detect downtime before your users : Assertible

07/11/2017 Featured Christopher Reichert

Behind every great API is a reliable uptime monitoring system. In today's internet world filled with SaaS apps, there are many monitoring tools to choose from. If you're new to API monitoring, you might be confused on exactly what you should be monitoring or the right way to do it.

In this post I'll outline some tips to help you determine what and how to monitor your API. Using these tips can help you build out API monitoring automation that works well for your needs:

Monitor multiple endpoints
Functional checks
Eliminate flaky checks
Actionable alert notifications
Monitor testing and staging environments

Stripe status page reflects API monitoring status

_I may be a little biased since I work on [Assertible](/) which is an API monitoring tool. That said, this list is of tips is intended to help beginners create reliable API monitoring to detect downtime. I can't help that Assertible is really good at monitoring and satisfies these criteria!_

1. Monitor multiple endpoints

When you first start monitoring an API, you will probably setup a simple GET request to a specific endpoint (/ or /health) and check that it returns the HTTP status 200 OK. This is the perfect way to start, but quickly becomes limited for any growing API.

An API can suffer downtime for a plethora of reasons. Not only can downtime be related to the entire system going offline due to server issues, it can also be related to a specific endpoint or API transaction caused by bugs in your application.

The first thing you should do to avoid these types of errors is monitor multiple critical endpoints. In an ideal world, each unique endpoint would have a simple monitor to check a basic HTTP request returns the 200 OK status.

Tip: Start with one endpoint, then expand your monitoring to multiple endpoints.

Tip: To detect errors which you may not know exist (and thus aren't monitoring), use an error reporting service like Rollbar or Sentry from within your application.

2. Functional checks

Basic uptime checks monitoring multiple API endpoints is a great start. But now what? As you continue to work on your API and deploy new versions, there are many scenarios where a simple 200 OK check on an endpoint is not sufficient.

Functional API checks are a great way to model real-world user-scenarios and ensure the availability of a specific transaction on an endpoint(s). For example, your functional test can:

Monitor CRUD operations like POST/PUT/DELETE
Validate payloads using JSON Schema validation
Check payload data (using JSON Path or XPath)
Identify latency by checking API response times
Check status codes that are not HTTP 200 OK to identify API transactions that should fail

Tip: After you have some simple monitoring checks setup, consider your API's most important functional requirements and setup tests to monitor those actions.

3. Eliminate flaky checks

One of the most important aspects to leveraging an API monitoring solution is to vigilantly eliminate flaky checks. A flaky test (or check) happens when you get an alert or downtime notification but nothing is wrong with your API. These can happen due to some unexpected non-deterministic behavior, when a test has too many steps, or when a test is otherwise too complicated.

Checks that are not reliable or fail often with false-positives often will create too much noise, distract you and your team, and potentially cause team members to ignore important downtime alerts.

Tip: Don't allow a flaky check to stay in your system. Refactor a flaky test immediately or completely delete it with a more simple test that does not raise false-positives if at all possible.

4. Actionable alert notifications

When you receive an alert that your API is down, it's critical that the notification tells you the most vital information immediately so you can take action. API downtime alerts that require you to open a link to view the primary parts of the failure first is a step in the wrong direction.

API downtime notifications should be immediately actionable, otherwise, you will waste time opening the dashboard for another web app instead of responding to your web service's issue.

In practice, this mean your alert should give you at least basic information like the HTTP status code and precise information about why the check failed. For example, a downtime notification from Assertible's Slack integration looks like this:

Assertible API testing Slack integration

We've written more extensive blog post about this topic: Improving downtime alerts by comparing Pingdom and Assertible

Tip: Ensure that your downtime notifications give you enough critical information to respond to your API without navigating to an external website to view the failure first.

5. Monitor testing and staging environments

Lots of hosting providers make it easy to setup staging and testing versions of your app so that you can test it live before deploying to production. For example, Heroku has pipelines allows you to have a staging environment and temporary review apps (when using Heroku Review Apps).

I highly recommend monitoring non-production environments because it allows you to catch API errors before they hit production. A good API monitoring tool will allow you to reuse the same tests to monitor each unique environment.

Some monitoring tools also have features that allow you to smoke-test your API after it's deployed to a staging or dev environment. In Assertible, it's called the Deployments API. Automating smoke-tests after a deployment is the best way to identify errors in a new application version immediately and allows you to potentially run a large set of integration tests against your live API.

Tip: Monitor staging, qa, and testing versions of your application to further reduce the chances that a bug will land in production.

Tip: Automate comprehensive smoke-tests when your web service is deployed

Conclusions

The tips I've outlined in this post should help you start monitoring your API effectively. No team is immune to bugs so the most important aspect is to practice continuous testing and iteratively build out better tests at every stage of your development process (remember, vigilantly remove flaky tests!).

Tools like Assertible make API monitoring trivial and reduce bugs in your API so users don't just leave. We've spent a lot of time ensuring that Assertible meets the requirements I've outline in this post. If your keen to testing Assertible, I'd love to hear your feedback. Send me a message or reach out on Twitter and let's talk testing!

Examples and resources:

:: Christopher Reichert

New to API monitoring? Here are 5 tips to detect downtime before your users