Build Resilient Microservices Using Spring Retry and Circuit Breaker Pattern

“Everything fails all the time” — Werner Vogels
This is sad but true, everything fails specially in Microservice architecture with many external dependencies. Modern applications have tens of microservices which communicate with each other over REST. Anytime any microservice may go down causing entire operation to fail. At a broad level we can classify these failures in two categories

  • Transient — where application will heal itself in a matter of seconds such as network glitch.
  • Non-Transient — where application suffer for a longer period, minutes or hours such as database connection, unavailability due to high traffic or throttling limit.
    To improve the resilience of our microservice architecture we should consider following two patterns.


  1. Retry
  2. Circuit Breaker

For transient failures, we don’t want to fail the request immediately rather would prefer to retry few times. There may a temporary network glitch and next attempt may be successful. While implementing Retry Pattern you should be careful how many retries you want. May be you can limit to 3 retries for each REST call as an example. But if the failure is not transient and you keep on doing 3 retries for each REST call, pretty soon you will make further damage to the microservice which is already suffering. You should stop sending further request to the service after certain number of failures and resume sending requests after a while. You are right, I am talking about Circuit Breaker Pattern.
I have been after this for a while and recently implemented these two patterns in Spring boot microservice using Spring-Retry.
Concept is very simple, microservice A will make REST call to microservice B. If it fails, it will automatically retry 3 times. If Service B is still unable to process, fallback method will be called. After certain number of fallback method is execute in a given time frame, circuit will be opened. As a result Service A will not make any further REST call to Service B, reducing cloud resource usage, network bandwidth usage. Once reset time is over, circuit will be closed automatically allowing REST calls to Service B again.


Above log indicates for each request, our service retried 3 times (“called ShakyExternalService api/customer/name”) before executing the fallback method ( “returning name from fallback method”). Once fallback method is called 3 times in a period of 15 seconds, circuit was opened and further request to the api was served directly from fallback without trying to make API call. In the log you can see last 3 lines where fallback was executed directly.

Checkout my article on Media ( ) for source code and further details.