Intro
Today I want to show you how to implement a retry mechanism the external API requests in Python.
Before we jump straight to the code let me explain what the retry mechanism is.
In software engineering, a retry mechanism is a way to automatically repeat an action that has failed, in the hopes that the action will succeed on a subsequent attempt. This is often used when dealing with unreliable or flaky systems, where it’s possible that a request or operation might fail due to transient issues such as network timeouts, server errors, or other temporary glitches.
Instead of immediately giving up and reporting an error to the user, the software will retry the failed action a certain number of times, with a short delay between each attempt. If the action eventually succeeds, the software will continue as normal; if the action continues to fail after all the retries, it will report an error to the user.
That’s all when it comes to the retry mechanism description itself.
To complete the description section I have to mention a few other terms:
-
Retry Limit - This is the maximum number of times that the action will be retried before giving up and reporting an error. The retry limit is often configurable and can be set based on the specific needs of the software and the system it’s interacting with.
-
Backoff - This refers to the delay between each retry attempt. The idea behind backoff is to introduce a progressively longer delay between retries, to avoid overwhelming the system with repeated requests. For example, the backoff might start with a short delay of a few seconds, and then double with each subsequent retry.
-
Backoff Strategy - This refers to the specific algorithm used to determine the delay between each retry attempt. There are several backoff strategies, including fixed, linear, exponential, and jittered backoff. Each strategy has its own strengths and weaknesses and is suitable for different use cases.
-
Backoff Rate - This is the factor by which the delay between retries is multiplied in each iteration of the backoff. For example, if the initial delay is 1 second and the backoff rate is 2, the delay between the first and second retries will be 2 seconds, the delay between the second and third retries will be 4 seconds, and so on.
For this blogpost purpose, we will use a retry mechanism for the API HTTP requests to the external system. In the previous blogpost I described how to test API requests to the external system. We will use that code to implement the retry mechanism.
Code
I’m using the tenacity library to implement the retry mechanism.
I recommend this library. I have used it in production, it works very well.
It has a lot of options, and it is very easy to configure.
Basically, all we need to do is to add the @retry
decorator to the functions.
|
|
and the tests to check if our retry works as expected:
|
|
At this stage, I wanted to mention one more thing. Using tenacity is not the only option. There is a possibility to implement a retry mechanism using the urllib3 retry object. I didn’t show you this in the code example because of one simple reason: it is very hard to write a unit test for this solution.
In one of my previous projects, I tried to write unit tests for retry implemented using the urllib3 retry object. I failed 😔.
First of all, it is not possible to use libraries like responses or request-mock - they don’t have support for it.
The only option is some dirty hacks with @patch
- you can read more about it here.
If you know how to achieve it, using clean methods without using @patch
, please let me know.
I would love to see the solution 🙂.
Summary
Overall, the “retry” mechanism is a useful tool in software engineering for dealing with unreliable systems and ensuring that actions that can’t be completed on the first try are still able to eventually succeed.
It is very popular and common practice especially when it comes to microservices architecture. I hope you know the first fallacy from the Fallacies of distributed computing. This is “The network is reliable”. Having that in mind we as developer needs to put as much effort as we can to minimize the risk and try to guarantee robust communication between our systems.
That’s all, I hope you enjoyed it 🙂. Let me know what your opinion about the retry mechanism is. Did you implement it? Have you encountered some problems during implementation? I would like to see your perspective 🙂