Wednesday, March 11, 2020


Introduction

Launched in December 2016, AWS X-Ray provides an end-to-end view of service mapping and traces for application flows to allow easier analysis and debugging of distributed applications running on AWS Cloud. 



This service is designed for various types of microservices applications and works with a wide variety of AWS services, like Lambda, EC2, ECS, EBS, and EKS.

In this article, we’ll explore how X-Ray distributed tracing works, its architecture, initial setup, and some simple use cases. We’ll conclude by taking a quick tour of some of the available alternatives.

How AWS X-Ray works

X-Ray first collects data from each of the services, combines the gathered data into a single unit known as a trace. Then, using this data, it builds a service map that shows each request trace and displays the latency, HTTP status code, and other metadata information. All of this information can be crucial to developers and architects trying to debug performance issues in development and production alike.

For example, if we consider a simple microservice-based application consisting of AWS API Gateway, Lambda, and DynamoDB, we can watch how each transaction is performing, starting from API Gateway all the way through to DynamoDB.

source: AWS

AWS X-Ray and distributed tracing

Distributed tracing is a must-have in a microservice architecture. In a typical microservice design, each service will be distributed hosted and communicating with each other in a single request. It can become highly tedious to trace a request that spans across multiple AWS services. If we use only CloudWatch Logs to debug an issue, it takes significant time to go over each LogGroup & LogStream created to find the request in order to debug. That’s why we need a distributed tracing service like X-Ray which can generate an end-to-end view of each request and provide related information to debug issues.

AWS X-Ray architecture

AWS X-Ray has five major components.
source: AWS


X-Ray daemon
The daemon is a software application that listens to UDP port 2000. It gathers the trace data from each segment and relays it to the X-Ray API. This is eventually used by the X-Ray console to group everything together to display service maps and traces. It works in conjunction with the X-Ray SDK to collect the data.

X-Ray API

AWS X-Ray provides many APIs for managing and retrieving traces and service maps.
It also helps to retrieve other telemetry data created by processing those traces. These
APIs can be accessed through AWS SDK and AWS CLI.

X-Ray console

The console provides us a view of service maps and traces of application requests. The service map provides a visualization of the JSON service graph. The JSON consists of the trace data generated by applications.

X-Ray SDK

AWS supports many languages, including Java, Nodejs, and so on, with SDKs that provide classes and methods for generating and sending trace data to the X-Ray daemon.

Other Clients


These include the likes of AWS SDK and AWS CLI, which provide features to interact with X-Ray API and daemon.

AWS X-Ray Setup

There are various AWS services for which X-Ray can be setup. We will limit this article to only the Lambda function setup (we are serverless-first, after all). Let’s explore the various channels through which X-Ray can be set up for a Lambda function:


1. AWS Console - Once a Lambda function is developed and deployed to the Lambda service, X-Ray can be enabled through the AWS console.
AWS Console

It requires an IAM role to be assigned, which has the necessary access to put messages in the X-Ray service.
2. AWS CLI - It can be used to interact with X-Ray APIs, and put and retrieve traces.
3. Serverless Framework - Serverless Framework is the most commonly used deployment framework used to build production-grade serverless applications. X-Ray tracing can be enabled by putting the following configuration in your serverless.yaml file.

Serverless.yaml -
---
  tracing:
     apiGateway: true
     lambda: true
iamRoleStatements:
     - Effect: Allow
         Action:
         - dynamodb:*
         - logs:*
                 - xray:PutTraceSegments
         - xray:PutTelemetryRecords
       Resource: "*"
---

Building a simple use case

Ok. Let’s build a Lambda function and enable the X-Ray service for it and see how it helps to debug issues:
Step 1 - We will use the AWS Cloud9 service to create and deploy the Lambda function.

Step 2 - We have created a simple Lambda Function using Node.js, and deployed it to the Lambda service:

Step 3 - Now, let’s enable the X-Ray tracing through the AWS Console. It will also add a permission to the execution role of the Lambda function to send data to the X-Ray service:

Step 4 - Let's hit the Test button a couple of times on the top right side of the screen to generate some traces. Now, when we go to the X-Ray console we can see that there are two nodes, one for the Lambda invocation and another for Lambda execution.

Step 5 - We can click on the ‘View traces’ button on the bottom right side of the screen. From here we can view the details of how much time a particular request took for invocation/execution and the HTTP status code:

Step 6 - Now, let’s try to add some more code to our Lambda function. We will add code that tries to access the S3 service and call the S3 API method to return the list of buckets.



Step 7 - Again, let’s deploy this code and hit the ‘Test’ button a couple of times to generate a few traces. If you notice, now there will be one more node for the S3 segment, but outlined in yellow rather than in green. The reason for this is that it has some errors.



Step 8 - Let’s go to the trace to debug this error. We will observe a red alert icon for the S3 record. When we click on this it will show error details. As we didn’t assign any S3 access-related IAM policy to the role assigned to this function, it is not able to access the S3 buckets:



While this example may be simple, there are many complicated issues that can be debugged using X-Ray, especially performance-related issues.

We can also define annotation and metadata for requests by creating sub-segments using AWS SDKs for various supported languages. These attributes can be used to search particular traces using keywords.


Request Sampling

X-Ray request sampling is the most important unspoken feature. An x-Ray may be relatively cheap, but if we trace every request of our application, the cost will add up very quickly. So, we can use sampling to filter requests based on specified criteria. Using custom rules we can control the behavior of tracing sampling without modifying the code. For example, if we want to trace only the POST method call of the API, but not GET, PUT and DELETE, we can configure it.


Limitations of AWS X-Ray
Even though we had an error accessing the S3 service from our Lambda function, X-Ray shows “OK” for all the traces, as it’s getting a 200 response. This is a bit confusing for developers as we won’t know if there are errors until we go inside each trace.

Annotation and metadata for filtering traces can be added only for sub-segments. It cannot be done for root segments, so there is no UI feature to add these at the segment level. That means, we always need to go into our code and add sub-segment code using AWS SDKs around each functional code block we want to trace.

AWS X-Ray Pricing


Like many other AWS services, X-Ray doesn’t have any upfront fees. We pay for what we use - the number of resources recorded, retrieved and scanned. Free tier: The first 100,000 traces recorded each month are free. The first 1,000,000 traces retrieved or scanned each month are free. Charges beyond the free tier: $5.00 per 1 million traces recorded. $0.50 per 1 million traces retrieved. $0.50 per 1 million traces scanned.


Alternatives to AWS X-Ray

There are generally two types of distributed tracing solutions available: open source and commercial tools. Let’s look at how these tools stack up as alternatives to X-Ray.

Open source tools - these are free to use. However, they need to be set up, monitored and managed by teams. Examples include Jaeger, OpenCensus and Zipkin.

Commercial tools - These are easy to set up and use, compared to the open-source tools. These are generally charged based on the number of invocations.

Jaeger 

Jaeger is an open-source tracing system used for microservice applications. It is built on OpenTracing standards. It provides an all-in-one executable that can be used to install Jaeger for quick local testing. It is part of the CNCF project, so it is the preferred choice for the Kubernetes environment.

OpenCensus

OpenCensus is another Apache-licensed open-source system that originated from Google. It provides a set of libraries called Census that are used to automatically capture traces and metrics from services.

Zipkin 

Zipkin was developed by Twitter and is now supported by a dedicated open source community. It was one of the first distributed tracing systems to gain popularity. Applications need to be instrumented via Http and Kafka to send the trace data to the Zipkin server.


Summary

Distributed tracing is the need of the hour as more and more applications are built using microservice patterns

Monitoring and debugging has proven to be one of the more challenging aspects of serverless development, so investing in the right system for your needs is a must. AWS X-Ray is just one of several options available, so the choice is yours to make, based on your needs, budget and other constraints.




Monday, January 27, 2020

What is Lambda Function and Destinations


AWS Lambda function is a very well-known service that builds serverless applications. Since its inception in 2014, it has been adopted very well by developers and architects.

AWS serverless application comprises several components. However, Function is the backbone of it. Other AWS services interact with each other through Function, completing the serverless workflow.


1. Event Source: - Lambda Function doesn’t have a REST API endpoint. Hence, it cannot be
called directly from outside clients. It needs event triggering to get invoked. An event can be:

- Data record change of a DynamoDB table
- Request to a REST endpoint through API Gateway
- Change in the state of resource e.g. file uploaded to the S3 Bucket

It can also be triggered through CloudWatch Events. As Lambda has been evolving, it’s constantly adding many other types of events.



2. Function: - All the functional logic goes here. It does support many languages, including but not limited to Java, Go, Node, Python. The event source triggers it and then connects to all
other AWS Services to create a serverless pattern.



3. Services: - It can be either an AWS service or an external service. These services would be
required either to persist or retrieve the data from other systems. Some of the services like
SQS, SNS complements serverless flow with event-driven architecture.

In Nov 2019 reinvent, AWS announced a new feature in Lambda Function known as Lambda
Destinations. With Destinations, you can route asynchronous function results as an execution
record to a destination resource without writing additional code. An execution record contains
details about the request and response in JSON format including version, timestamp, request
context, request payload, response context, and response payload. For each execution status
such as Success or Failure, you can choose one of four destinations: another Lambda function, SNS, SQS, or EventBridge. Lambda can also be configured to route different execution results to different destinations.

Asynchronous Function Execution Result




What is Step Functions 

AWS Step Functions is an orchestrator that helps to design and implement complex workflows. When we need to build a workflow or have multiple tasks that need orchestration, Step Functions coordinates between those tasks. This makes it simple to build multi-step systems.

Step Functions is built on two main concepts Tasks and State Machine.

All work in the state machine is done by tasks. A task performs work by using an activity or an AWS Lambda function or passing parameters to the API actions of other services.

A state machine is defined using the JSON-based Amazon States Language. When an AWS Step Functions state machine is created, it stitches the components together and shows the developers their system and how it is being configured. Have a look at a simple example:





Lambda Destinations vs Step Functions


Topic
Lambda Destination
Step Functions
Sync vs Async Invocation
It supports only Async and Stream invocation
It supports both sync and async flows. In either flow, need to call StartExecution API.
Services Supported
For async, It supports only 4 destinations services Lambda, SQS, SNS, and EventBridge (as of Jan 17, 2020)

For Stream invocation, It supports only SQS and
SNS
It supports many more services other than the 4 like AWS Fargate, SageMaker, ECS, Non-AWS workload and many others.
Branching -Parallel Processing
It doesn't support parallel branching of execution
It can support dynamic parallelism, so you can optimize the performance and efficiency of application workflows such as data processing and task automation. By running identical tasks in parallel, you can achieve consistent execution durations and improve the utilization of resources to save on operating costs. Step Functions automatically scales resources in response to your input. 
Branching - Choice
For Async invocation, It supports only success and failure conditions.

For Stream invocation, it supports only the failure condition.
It provides the feature to put any type of choice :
"Choices": [
{
"Variable": "$.foo",
"
NumericEquals": 1,
"Next": "
FirstMatchState"
},
{
"Variable": "$.foo",
"
NumericEquals": 2,
"Next": "
SecondMatchState"
}
]
Retry
It does support Lambda function to retry for Asynchronous invocation before calling to Lambda Destinations. But it doesn't give feature to configure of BackoffRate
It has a Retry policy to handle Lambda failures:
"Retry": [
{
"
ErrorEquals": ["CustomError"],
"
IntervalSeconds": 1,
"
MaxAttempts": 2,
"
BackoffRate": 2.0
},


Conclusion

To summarize, both the services can co-exist as both have features that can be used in different scenarios. I would recommend starting with Lambda destinations service and see if that suffice your purpose as it will be much cheaper and requires less maintenance for small use cases. If it doesn't fulfill your need, you may go for Step functions.

Wednesday, January 1, 2020

Overview


Spring Cloud Hystrix Project was built as a wrapper on top of the Netflix Hystrix library. Since then, It has been adopted by many enterprises and developers to implement the Circuit Breaker pattern.
In November 2018 when Netflix announced that they are putting this project into maintenance mode, it prompted Spring Cloud to announce the same. Since then, no further enhancements are happening in this Netflix library. In SpringOne 2019, Spring announced that Hystrix Dashboard will be removed from Spring Cloud 3.1 version which makes it officially dead.
As the Circuit Breaker pattern has been advertised so heavily, many developers have either used it or want to use it, and now need a replacement. Resilience4j has been introduced to fulfill this gap and provide a migration path for Hystrix users.

Resilience4j

Resilience4j has been inspired by Netflix Hystrix but is designed for Java 8 and functional programming. It is lightweight compared to Hystrix as it has the Vavr library as its only dependency. Netflix Hystrix, by contrast, has a dependency on Archaius which has several other external library dependencies such as Guava and Apache Commons.
A new library always has one advantage over a previous library - it can learn from the mistakes of its predecessor. Resilience4j also comes with many new features:

CircuitBreaker

When a service invokes another service, there is always a possibility that it may be down or having high latency. This may lead to exhaustion of the threads as they might be waiting for other requests to complete. The CircuitBreaker pattern functions in a similar fashion to an electrical Circuit Breaker:
  • When a number of consecutive failures cross the defined threshold, the Circuit Breaker trips.
  • For the duration of the timeout period, all requests invoking the remote service will fail immediately.
  • After the timeout expires the Circuit Breaker allows a limited number of test requests to pass through.
  • If those requests succeed the Circuit Breaker resumes normal operation.
  • Otherwise, if there is a failure the timeout period begins again.

RateLimiter

Rate Limiting pattern ensures that a service accepts only a defined maximum number of requests during a window. This ensures that underline resources are used as per their limits and don't exhaust.

Retry

Retry pattern enables an application to handle transient failures while calling to external services. It ensures retrying operations on external resources a set number of times. If it doesn't succeed after all the retry attempts, it should fail and response should be handled gracefully by the application.

Bulkhead

Bulkhead ensures the failure in one part of the system doesn't cause the whole system down. It controls the number of concurrent calls a component can take. This way, the number of resources waiting for the response from that component is limited. There are two types of bulkhead implementation:
  • The semaphore isolation approach limits the number of concurrent requests to the service. It rejects requests immediately once the limit is hit.
  • The thread pool isolation approach uses a thread pool to separate the service from the caller and contain it to a subset of system resources.
The thread pool approach also provides a waiting queue, rejecting requests only when both the pool and queue are full. Thread pool management adds some overhead, which slightly reduces performance compared to using a semaphore, but allows hanging threads to time out.             

Build a Spring Boot Application with Resilience4j

In this article, we will build 2 services - Book Management and Library Management.
In this system, Library Management calls Book Management. We will need to bring Book Management service up and down to simulate different scenarios for the CircuitBreaker, RateLimit, Retry and Bulkhead features.

Prerequisites

  1. JDK 8
  2. Spring Boot 2.1.x
  3. resilience4j 1.1.x (latest version of resilience4j is 1.3 but resilience4j-spring-boot2 has latest version 1.1.x only)
  4. IDE like Eclipse, VSC or intelliJ (prefer to have VSC as it is very lightweight. I like it more compared to Eclipse and intelliJ)
  5. Gradle
  6. NewRelic APM tool ( you can use Prometheus with Grafana also)

Book Management service
  1. Gradle Dependency   
This service is a simple REST-based API and needs standard spring-boot starter jars for web and test dependencies. We will also enable swagger to test the API:
dependencies {
    //REST
    implementation 'org.springframework.boot:spring-boot-starter-web'
    //swagger
    compile group: 'io.springfox', name: 'springfox-swagger2', version: '2.9.2'
    implementation group: 'io.springfox', name: 'springfox-swagger-ui', version: '2.9.2'
    testImplementation 'org.springframework.boot:spring-boot-starter-test'
}
     2. Configuration
The configuration has only a single port as detailed configuration:
server:
    port: 8083

  1. Service Implementation
It has two methods addBook and retrieveBookList. Just for demo purposes, we are using an ArrayList object to store the book information:
@Service
public class BookServiceImpl implements BookService {

    List<Book> bookList = new ArrayList<>();

    @Override
    public String addBook(Book book) {
        String message  =   "";
        boolean status  =   bookList.add(book);
        if(status){
            message=    "Book is added successfully to the library.";
        }
        else{
             message=    "Book could not be added in library due to some technical issue. Please try later!";
        }
        return message;
    }

    @Override
    public List<Book> retrieveBookList() {
        return bookList;
    }
}

  1. Controller
Rest Controller has exposed two APIs - one is POST for adding book and the other is GET for retrieving book details:
@RestController
@RequestMapping("/books")
public class BookController {

    @Autowired
    private BookService bookService  ;

    @PostMapping
    public String addBook(@RequestBody Book book){
        return bookService.addBook(book);
    }

    @GetMapping
    public List<Book> retrieveBookList(){
        return bookService.retrieveBookList();
    }
}
  1. Test Book Management Service
Build and start the application by using the below commands:
//build application
gradlew build

//start application
java -jar build/libs/bookmanangement-0.0.1-SNAPSHOT.jar

//endpoint url
http://localhost:8083/books
Now we can test the application using Swagger UI - http://localhost:8083/swagger-ui.html
Ensure the service is up and running before moving to build the Library Management service.

Library Management service

In this service, we will be enabling all of the Resilience4j features.
  1. Gradle Dependency   
This service is also a simple REST-based API and also needs standard spring-boot starter jars for web and test dependencies. To enable CircuitBreaker and other resilience4j features in the API, we have added a couple of other dependencies like - resilience4j-spring-boot2, spring-boot-starter-actuator, spring-boot-starter-aop. We need to also add micrometer dependencies (micrometer-registry-prometheus, micrometer-registry-new-relic) to enable the metrics for monitoring. And lastly, we enable swagger to test the API:
dependencies {
    
    compile 'org.springframework.boot:spring-boot-starter-web'
    
    //resilience
    compile "io.github.resilience4j:resilience4j-spring-boot2:${resilience4jVersion}"
    compile 'org.springframework.boot:spring-boot-starter-actuator'
    compile('org.springframework.boot:spring-boot-starter-aop')
 
    //swagger
    compile group: 'io.springfox', name: 'springfox-swagger2', version: '2.9.2'
    implementation group: 'io.springfox', name: 'springfox-swagger-ui', version: '2.9.2'

    // monitoring
        compile "io.micrometer:micrometer-registry-prometheus:${resilience4jVersion}"
      compile 'io.micrometer:micrometer-registry-new-relic:latest.release'

    testImplementation 'org.springframework.boot:spring-boot-starter-test'
}
  1. Configuration
Here, we need to do a couple of configurations -
-  By default CircuitBreaker and RateLimiter actuator APIs are disabled in spring 2.1.x. We need to enable them using management properties. Refer those properties in the source code link shared at the end of the article. We also need to add the following other properties:
-  Configure NewRelic Insight API key and account id
management:
   metrics:
    export:
      newrelic:
        api-key: xxxxxxxxxxxxxxxxxxxxx
        account-id: xxxxx
        step: 1m
-  Configure resilience4j CircuitBreaker properties for "add" and "get" service APIs.
 resilience4j.circuitbreaker:
  instances:
    add:
      registerHealthIndicator: true
      ringBufferSizeInClosedState: 5
      ringBufferSizeInHalfOpenState: 3
      waitDurationInOpenState: 10s
      failureRateThreshold: 50
      recordExceptions:
        - org.springframework.web.client.HttpServerErrorException
        - java.io.IOException
        - java.util.concurrent.TimeoutException
        - org.springframework.web.client.ResourceAccessException
        - org.springframework.web.client.HttpClientErrorException
      ignoreExceptions:
-  Configure resilience4j RateLimiter properties for "add" service API.
resilience4j.ratelimiter:
  instances:
    add:
      limitForPeriod: 5
      limitRefreshPeriod: 100000
          timeoutDuration: 1000ms
-  Configure resilience4j Retry properties for "get" service API.
resilience4j.retry:
  instances:
    get:
      maxRetryAttempts: 3
      waitDuration: 5000
-  Configure resilience4j Bulkhead properties for "get" service API.
resilience4j.bulkhead:
  instances:
    get:
      maxConcurrentCall: 10
      maxWaitDuration: 10ms
Now, we will be creating a LibraryConfig class to define a bean for RestTemplate to make a call to Book Management service. We have also hardcoded the endpoint URL of Book Management service here. Not a good idea for a production-like application but the purpose of this demo is only to showcase the resilience4j features. For a production app, we may want to use the service-discovery service.
@Configuration
public class LibraryConfig {
    Logger logger = LoggerFactory.getLogger(LibrarymanagementServiceImpl.class);
    private static final String baseUrl = "https://bookmanagement-service.apps.np.sdppcf.com";

    @Bean
    RestTemplate restTemplate(RestTemplateBuilder builder) {
        UriTemplateHandler uriTemplateHandler = new RootUriTemplateHandler(baseUrl);
        return builder
                .uriTemplateHandler(uriTemplateHandler)
                .build();
   }
 
}
  1. Service
Service Implementation has methods which are wrapped with @CircuitBreaker,@RateLimiter, @Retry  and @Bulkhead annotations These all annotation supports fallbackMethod attribute and redirect the call to the fallback functions in case of failures observed by each pattern. We would need to define the implementation of these fallback methods:
This method has been enabled by CircuitBreaker annotation. So if /books endpoint fails to return the response, it is going to call fallbackForaddBook() method.
    @Override
    @CircuitBreaker(name = "add", fallbackMethod = "fallbackForaddBook")
    public String addBook(Book book){
        logger.error("Inside addbook call book service. ");
        String response = restTemplate.postForObject("/books", book, String.class);
        return response;
    }
This method has been enabled by RateLimiter annotation. If the /books endpoint is going to reach the threshold defined in a configuration defined above, it will call fallbackForRatelimitBook() method.
    @Override
    @RateLimiter(name = "add", fallbackMethod = "fallbackForRatelimitBook")
    public String addBookwithRateLimit(Book book){
        String response = restTemplate.postForObject("/books", book, String.class);
        logger.error("Inside addbook, cause ");
        return response;
    }
This method has been enabled by Retry annotation. If the /books endpoint is going to reach the threshold defined in a configuration defined above, it will call fallbackRetry() method.
    @Override
    @Retry(name = "get", fallbackMethod = "fallbackRetry")
    public List<Book> getBookList(){
        return restTemplate.getForObject("/books", List.class);
    }
This method has been enabled by Bulkhead annotation. If the /books endpoint is going to reach the threshold defined in a configuration defined above, it will call fallbackBulkhead() method.
    @Override
    @Bulkhead(name = "get", type = Bulkhead.Type.SEMAPHORE, fallbackMethod = "fallbackBulkhead")
    public List<Book> getBookListBulkhead() {
        logger.error("Inside getBookList bulk head");
        try {
            Thread.sleep(100000);
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        return restTemplate.getForObject("/books", List.class);
    }
Once the service layer is set up, we need to expose the corresponding REST APIs for each of the methods so that we can test them. For that, we need to create the RestController class.
  1. Controller
Rest Controller has exposed 4 APIs -
   - First one is a POST for adding a book
   - Second is again a POST for adding book but this will be used to demo Rate-Limit feature.
   - The third one is GET for retrieving book details.
   - Forth one is GET API for retrieving book details but enabled with bulkhead feature.
@RestController
@RequestMapping("/library")
public class LibrarymanagementController {

    @Autowired
    private LibrarymanagementService librarymanagementService;
    @PostMapping
    public String addBook(@RequestBody Book book){
        return librarymanagementService.addBook(book);
    }

    @PostMapping ("/ratelimit")
    public String addBookwithRateLimit(@RequestBody Book book){
        return librarymanagementService.addBookwithRateLimit(book);
    }

    @GetMapping
    public List<Book> getSellersList() {
        return librarymanagementService.getBookList();
    }
    @GetMapping ("/bulkhead")
    public List<Book> getSellersListBulkhead() {
        return librarymanagementService.getBookListBulkhead();
    }
}
Now, the code is ready. We have to build and bring it up and running.
  1. Build and Test Library Management Service
Build and start the application by using the below commands:
//Build
gradlew build

//Start the application
java -jar build/libs/librarymanangement-0.0.1-SNAPSHOT.jar

//Endpoint Url
http://localhost:8084/library
Now we can test the application using Swagger UI - http://localhost:8084/swagger-ui.html

Run Test Scenarios for CircuitBreaker RateLimiter Retry and Bulkhead

CircuitBreaker - Circuit Breaker has been applied to addBook API. To test if it's working, we will stop the Book Management service.
  • First, observe the health of the application by hitting http://localhost:8084/actuator/health URL.
  • Now stop the Book Management service and hit addBook API of Library Management service using swagger UI
At the first step, It should show the circuit breaker state as "CLOSED". This is Prometheus metrics which we enabled through the micrometer dependency.
After we execute the second step, it will start failing and redirecting to the fallback method.
Once it crosses the threshold, which in this case is 5, it will trip the circuit. And, each call after that will directly go to the fallback method without making an attempt to hit Book Management service. (You can verify this by going to logs and observe the logger statement. Now, we can observe the /health endpoint showing CircuitBreaker state as "OPEN".
{
    "status": "DOWN",
    "details": {
        "circuitBreakers": {
            "status": "DOWN",
            "details": {
                "add": {
                    "status": "DOWN",
                    "details": {
                        "failureRate": "100.0%",
                        "failureRateThreshold": "50.0%",
                        "slowCallRate": "-1.0%",
                        "slowCallRateThreshold": "100.0%",
                        "bufferedCalls": 5,
                        "slowCalls": 0,
                        "slowFailedCalls": 0,
                        "failedCalls": 5,
                        "notPermittedCalls": 0,
                        "state": "OPEN"
                    }                        
                }
            }
        }
    }
}
We have deployed the same code to PCF (Pivotal Cloud Foundry) so that we can integrate it with NewRelic to create the dashboard for this metric. We have used micrometer-registry-new-relic dependency for that purpose.

Image 2 - NewRelic Insight CircuitBreaker Closed Graph
Rate Limiter - We have created separate API (http://localhost:8084/library/ratelimit) having the same addBook functionality but enabled with Rate-Limit feature. In this case, we would need Book Management Service up and running. With the current configuration for the rate limit, we can have a maximum of 5 requests per 10 seconds.
Once we hit the API for 5 times within 10 seconds of time, it will reach the threshold and get throttled. To avoid throttling, it will go to the fallback method and respond based on the logic implemented there. Below graph shows that it has reached the threshold limit 3 times in the last hour:

Retry - Retry feature enables the API to retry the failed transaction again and again until the maximum configured value. If it gets succeeded, it will refresh the count to zero. If it reaches the threshold, it will redirect it to the fallback method defined and execute accordingly. To emulate this, hit the GET API (http://localhost:8084/library) when Book Management service is down. We will observe in logs that it is printing the response from fallback method implementation.
Bulkhead - In this example, we have implemented the Semaphore implementation of the bulkhead. To emulate concurrent calls, we have used Jmeter and set up the 30 user calls in the Thread group.
We will be hitting GET API () enabled with @Bulkhead annotation. We have also put some sleep time in this API so that we can hit the limit of concurrent execution. We can observe in the logs that it is going to the fallback method for some of the thread calls. Below is the graph for the available concurrent calls for an API:
Summary
In this article, we saw various features that are now a must in a microservice architecture, which can be implemented using one single library resilience4j. Using Prometheus with Grafana or NewRelic, we can create dashboards around these metrics and increase the stability of the systems.
As usual, the code can be found over Github -  spring-boot-resilience4j

Follow by Email

Followers

Total Pageviews

Popular Posts