Running Batch Application in PCF

1. Overview

Most of the developers are creating Microservices these days and deploying to Cloud Platforms. Pivotal Cloud Foundry (PCF) is one of the known cloud platforms. When we talk about deploying applications on PCF, mostly they would be long-running processes which never ends like Web Applications, SPA, REST-based services. PCF monitors all these long-running instances and if one goes down, it spins up the new instance to replace the failed one. This works fine where the process is expected to run continuously but for a batch process, its an overkill. The container will be running all the time with no CPU usage and add up to the cost. Many developers have an ambiguity that PCF cannot run a batch application which can just be initiated based on a request. But that is not correct.

Spring Batch enables to create a batch application and provide many out of the box features to reduce the boilerplate code. Recently, Spring cloud Task has been added in the list of projects to create short-running processes. With both of these options, we can create microservices, deploy on PCF and then stop them so that PCF doesn't try to self heal them. And with the help of PCF Scheduler, we can schedule the task to run them at a certain time of the day. Let's see in this article how we can do that with very few steps.

2. Pre-requisite

  • JDK 1.8
  • Spring Boot knowledge
  • Gradle
  • IDE (Eclipse, VSC, etc...)
  • PCF instance

3. Develop the Spring Batch Application

Let's develop a small Spring batch application (spring-batch-master) which will read a file with employee data and then add department name to each of the employee records.

3.1 BatchConfiguration

Let's start with BatchConfiguration file. I have added 2 Jobs here and both show a different way of implementing the batch process. The first one is using Spring Batch Chunks and configured to set up the Job flow with steps. Each step will have reader, processor, and writer configured. The second one is using Tasklet:

@Configuration
public class BatchConfiguration { 
    @Bean
    public JobLauncher jobLauncher(JobRepository jobRepo) {
        SimpleJobLauncher simpleJobLauncher = new SimpleJobLauncher();
        simpleJobLauncher.setJobRepository(jobRepo);
        return simpleJobLauncher;
    }
    @Bean
    public Job departmentProcessingJob() {
        return jobBuilderFactory.get("departmentProcessingJob")
                .flow(step1())
                .end()
                .build();
    }
    @Bean
    public Step step1() {
        return stepBuilderFactory.get("step1")
                .<Employee, Employee>chunk(1)
                .reader(reader())
                .processor(processor())
                .writer(writer())
                .build();
    }
    @Bean
    public Job job2() {
      return this.jobBuilderFactory.get("job2")
        .start(this.stepBuilderFactory.get("job2step1")
          .tasklet(new Tasklet() {
            @Override
            public RepeatStatus execute
              (StepContribution contribution, ChunkContext chunkContext) 
              throws Exception {
              logger.info("Job2 was run");
              return RepeatStatus.FINISHED;
            }
          })
          .build())
        .build();
    }
}

Let's discuss first job departmentProcessingJob in little more detail. This job has step1 which
does read the file, process it and then print it.

3.2 DepartmentReader

This code has logic to read the employee data from a file as part of the first step:

public class DepartmentReader {
    public FlatFileItemReader<Employee> reader() {
        FlatFileItemReader<Employee> reader = new FlatFileItemReader<Employee>();
        reader.setResource(new ClassPathResource("employee_data.txt"));
        reader.setLineMapper(new DefaultLineMapper<Employee>() {{
            setLineTokenizer(new DelimitedLineTokenizer() {{
                setNames(new String[]{"id", "employeenumber", "salary"});
            }});
            setFieldSetMapper(new BeanWrapperFieldSetMapper<Employee>() {{
                setTargetType(Employee.class);
            }});
        }});
        return reader;
    }
}

3.3 DepartmentProcessor

This code has logic to add Department Name to each employee record; based on some condition:

public class DepartmentProcessor implements ItemProcessor<Employee, Employee> {
    @Override
     public Employee process(Employee item) throws Exception {
        if ("1001".equalsIgnoreCase(item.getEmployeeNumber())) {
            item.setDepartment("Sales");
        } else if ("1002".equalsIgnoreCase(item.getEmployeeNumber())) {
            item.setDepartment("IT");
        } else {
            item.setDepartment("Staff");
        }
        System.out.println("Employee Details --> " + item.toString());
        return item;
    }
}

3.4 DepartmentWriter

This code has logic to print the Employee records with appended Department name:

public class DepartmentWriter implements ItemWriter<Employee> {
    @Override
    public void write(List<? extends Employee> items) throws Exception {
        List<String> employeeList = new ArrayList<>();
        items.forEach(item -> {
            String enrichedTxn = String.join(",", item.getId(), 
                    item.getEmployeeNumber(), item.getSalary(),
                    item.getDepartment());
            employeeList.add(enrichedTxn);
        });
        employeeList.forEach(System.out::println);
    }
}

3.5 BatchCommandLineRunner

Now, lets bootstrap the application logic to run as a Batch process. Spring provides CommandLineRunner to enable it:

@Component
public class BatchCommandLineRunner implements CommandLineRunner {
   @Autowired
   JobLauncher jobLauncher;
 @Autowired
  Job departmentProcessingJob;4.
  public void run(String... args) throws Exception { 
 JobParameters param = new JobParametersBuilder()
  .addString("JobID", String.valueOf(System.currentTimeMillis()))
  .toJobParameters(); jobLauncher.run(departmentProcessingJob, param);
             jobLauncher.run(job2, param); 
    }
}

4. Deploy Application on PCF

So far we have created a Spring Batch Job. Now let's deploy to PCF. 
You just need to package the code and cf push to PCF with a manifest file.

manifest.yaml:
---
applications:
- name: batch-example
memory: 1G
random-route: true
path: build/libs/spring-batch-master-0.0.1-SNAPSHOT.jar
no-hostname: true
no-route: true
health-check-type: none

We will observe that application did start and executed the logic then exited. The application will be shown as crashed in PCF Apps Manager. 

In PCF, all the applications run with process type: web.
It expects the application to be running all the time on some web port. However, for Batch application, it is not the case. So let's see how to handle that:
  • Stop the application manually.
  • Run this Job either manually with cf run-task command or schedule it using PCF Scheduler 

5. Start Batch Application with CF CLI

To run the Spring Batch (Boot) application on PCF, we need to run the following command:

cf run-task <APP Name>
".java-buildpack/open_jdk_jre/bin/java org.springframework.boot.loader.JarLauncher"

Now, you can use this command in a Bamboo/Jenkins pipeline to trigger the application with a cron job.

6. Schedule Batch Job with PCF Scheduler

To schedule the Batch Job, we can use PCF Scheduler. PCF Scheduler enables you to create tasks and schedule them using cron expression. We can go to the application in Apps Manager -> Tasks and click on Enable Scheduling to bind the application with PCF Scheduler. Now you can create Job as shown in below picture. For more details on how to use PCF Scheduler, you can read this blog.


Now, if we run this code, it should execute both the Jobs and give the below results:

7. Batch Application with Spring Cloud Task

Spring has come up with a new project called Spring Cloud Task (SCT). Its purpose is to create short-lived microservices on Cloud platforms. We just need to add a @EnableTask annotation. This will register TaskRepository and creates the TaskExecution which will pick up the Job defined and execute them one by one.

TaskRepository by default uses in-memory DB however, it can support most of the persistence DBs like Oracle, MySQL, PostgreSQL, etc..

So let's develop one more small application (sct-batch-job) with SCT.

Put @EnableTask in Spring Boot Main Class:
@SpringBootApplication
@EnableTask
@EnableBatchProcessing
public class EmployeeProcessingBatch {
    public static void main(String[] args) {
        SpringApplication.run(EmployeeProcessingBatch.class, args);
    }
}

Add two Jobs in SCTJobConfiguration file:
@Configuration
public class SCTJobConfiguration {
 private static final Log logger = LogFactory.getLog(SCTJobConfiguration.class);
 @Autowired
 public JobBuilderFactory jobBuilderFactory;
 @Autowired
 public StepBuilderFactory stepBuilderFactory;
 @Bean
 public Job job1() {
      return this.jobBuilderFactory.get("job1")
        .start(this.stepBuilderFactory.get("job1step1")
          .tasklet(new Tasklet() {
            @Override
            public RepeatStatus execute
              (StepContribution contribution, ChunkContext chunkContext) 
              throws Exception {
                logger.info("Job1 ran successfully");
                return RepeatStatus.FINISHED;
            }
          })
          .build())
        .build();
 }
 @Bean
 public Job job2() {
      return this.jobBuilderFactory.get("job2")
        .start(this.stepBuilderFactory.get("job2step1")
          .tasklet(new Tasklet() {
           @Override
           public RepeatStatus execute
             (StepContribution contribution, ChunkContext chunkContext) 
             throws Exception {
               logger.info("Job2 ran successfully");
               return RepeatStatus.FINISHED;
           }
          })
          .build())
        .build();
 }
}


That's it. So if you notice, I have used @EnableBatchProcessing to integrate SCT with Spring Batch. That way we can run Spring Batch application as a Task. We can now push this application to PCF as we did the earlier one and then either run it manually or schedule it using PCF Scheduler.

8. NodeJS Batch Job Scheduling with PCF Scheduler

Similarly, if we have a Batch Job running with NodeJS, we can first stop the application and then create the task with a command to start the nodejs application.


9. Conclusion

In this article, we have talked about how a Spring Batch application can be run on PCF. We can also use Spring Cloud Task to run a short-lived microservices. Spring Cloud Task also provides integration with Spring Batch so you can use full benefits of Batch as well as Spring Cloud Task.

As usual, the code can be found over Github - Spring Batch App, Spring Cloud Task App

No comments: