Java Batch Tutorial

In today’s world internet has changed the way we live our lives and one of the major reasons for that is the usage of internet for most of the daily chores. This lead to huge amount of data available for processing.

Some of the examples where huge data is involved are processing payslips, bank statements, interest calculation, etc. So imagine if all these jobs had to be done manually, it will take ages to finish these jobs.

How is it done in current age? The answer is Batch Processing.

1. Introduction

Batch processing is performed on bulk data, without manual intervention, and long-running. It might be data or computation-intensive. Batch jobs can be run on predefined schedule or can be initiated on demand. Also, since batch jobs are usually long-running jobs, constant checks and restarting from a certain failure are common features found in batch jobs.

1.1 History of Java Batch Processing

Batch processing for Java Platform was introduced as part of JSR 352 specification, part of the Java EE 7 platform, defines the programming model for batch applications plus a runtime to run and manage batch jobs.

1.2 Architecture of Java Batch

Below diagram shows the basic components for batch processing.

Architecture for Java Batch Processing

The architecture for batch applications solves batch processing concerns like jobs, steps, repositories, reader processor writer patterns, chunks, checkpoints, parallel processing, flow, retries, sequencing, partitioning, etc.

Let’s understand the flow of architecture.

1.3 Batch Processing Components.

We will now try to understand the batch processing components in detail.

The following section will help in understanding some common characters of a batch architecture.

1.3 Steps in Job

A Step is an independent phase of a Job. As discussed above, there are two types of steps in a Job. We will try to understand both the types in detail below.

1.3.1 Chunk-Oriented Steps

Chunk steps will read and process one item at a time and group the results into a chunk. The results are then stored when the chunk reaches a predefined size. Chunk-oriented processing makes storing results more efficient when the data set is huge. It contains three parts.

As part of chunk steps, there are checkpoints which provide information to the framework for the completion of chunks. If there is an error during a chunk processing the process can restart based on the last checkpoint.

1.3.2 Task-Oriented Steps

It executes task other than processing items from a data source. Which includes creation or removal of directories, moving files, creating or deletion of database tables, etc. Task steps are not usually long-running compared to chunk steps.

In a normal scenario, task-oriented steps are used after chunk-oriented steps where there is a clean-up needed. For example, we get log files as an output of an application. The chunk steps are used for processing the data and get meaningful information out of the log files.

The task step is then used to clean-up older log files which are no more needed.

1.3.3 Parallel Processing

Batch jobs often perform expensive computational operations and process large amounts of data. Batch applications can benefit from parallel processing in two scenarios.

Batch processing helps to finish tasks and perform operations faster for huge data.

2. Tools and Technologies

Let us look at the technologies and tool used for building the program.

3. Project Structure

The project structure will look as shown in the image below.

Project Structure for Java Batch

The above project structure is using Gradle. This project can also be created using maven and the build.gralde will get replaced with pom.xml file. The structure of the project will defer slightly with the usage of Maven for the build.

4. An objective of the Program

As part of the program, we will try to create a simple java batch application using spring boot. This application will perform the following tasks.

  1. Read: – Read employee data from a CSV file.
  2. Process the data: – Convert the employee data into all upper case.
  3. Write: – Write the processed employee data back in the database.

4.1 Gradle build

We are using Gradle for the build as part of the program. The build.gradle file will look as shown below.

build.gradle

buildscript {
    repositories {
        mavenCentral()
    }
    dependencies {
        classpath("org.springframework.boot:spring-boot-gradle-plugin:2.0.1.RELEASE")
    }
}
apply plugin: 'java'
apply plugin: 'eclipse'
apply plugin: 'idea'
apply plugin: 'org.springframework.boot'
apply plugin: 'io.spring.dependency-management'
bootJar {
    baseName = 'java-batch'
    version =  '1.0'
}
repositories {
    mavenCentral()
}
sourceCompatibility = 1.8
targetCompatibility = 1.8
dependencies {
    compile("org.springframework.boot:spring-boot-starter-batch")
    compile("org.hsqldb:hsqldb")
    testCompile("junit:junit")
}

In the above build.gradle file apply plugin: 'java' tells us the plugin that needs to be set. For us, it is Java plugin.
repositories{} lets us know the repository from which the dependency should be pulled. We have chosen mavenCentral to pull the dependency jars. We can use jcenter also for pulling the respective dependency jars.

dependencies {} tag is used to provide necessary jar file details that should be pulled for the project. apply plugin: 'org.springframework.boot' this plugin is used for specifying a spring-boot project. boot jar{} will specify the properties of the jar that will get generated from the build.

4.2 Sample data file

In order to provide data for the read phase, we will use a CSV file containing employee data.

The file will look as shown below.

Sample CSV file

John,Foster
Joe,Toy
Justin,Taylor
Jane,Clark
John,Steve

The sample data file contains the first and last name of the employee. We will use the same data for processing and then insertion in the database.

4.3 SQL scripts

We are using HSQL database which is a memory based database. The script will look as shown below.

SQL script

DROP TABLE employee IF EXISTS;
CREATE TABLE employee  (
    person_id BIGINT IDENTITY NOT NULL PRIMARY KEY,
    first_name VARCHAR(20),
    last_name VARCHAR(20)
);

Spring Boot runs schema-@@platform@@.sql automatically when it starts. -all is the default for all platforms. So the table creation will happen on its own when the application starts and it will be available until the application is up and running.

4.4 Model Class

We are going to create an Employee.java class as the model class. The class will look as shown below.

Model Class for the Program

package com.batch;
public class Employee {
	    private String lastName;
	    private String firstName;
	    public Employee() {
	    }
	    public Employee(String firstName, String lastName) {
	        this.firstName = firstName;
	        this.lastName = lastName;
	    }
	    public void setFirstName(String firstName) {
	        this.firstName = firstName;
	    }
	    public String getFirstName() {
	        return firstName;
	    }
	    public String getLastName() {
	        return lastName;
	    }
	    public void setLastName(String lastName) {
	        this.lastName = lastName;
	    }
	    @Override
	    public String toString() {
	        return "firstName: " + firstName + ", lastName: " + lastName;
	    }
	}

@Override is used for overriding the default implementation of the toString() method.

4.5 Configuration Class

We will create a BatchConfiguration.java class which will be the configuration class for batch processing. The java file will look as shown below.

BatchConfiguration.java

package com.batch.config;
import javax.sql.DataSource;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecutionListener;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider;
import org.springframework.batch.item.database.JdbcBatchItemWriter;
import org.springframework.batch.item.database.builder.JdbcBatchItemWriterBuilder;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.builder.FlatFileItemReaderBuilder;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;
import org.springframework.jdbc.core.JdbcTemplate;
import com.batch.Employee;
import com.batch.processor.EmployeeItemProcessor;
@Configuration
@EnableBatchProcessing
public class BatchConfiguration {
    @Autowired
    public JobBuilderFactory jobBuilderFactory;
    @Autowired
    public StepBuilderFactory stepBuilderFactory;
    // tag::readerwriterprocessor[]
    @Bean
    public FlatFileItemReader reader() {
        return new FlatFileItemReaderBuilder()
            .name("EmployeeItemReader")
            .resource(new ClassPathResource("sample-data.csv"))
            .delimited()
            .names(new String[]{"firstName", "lastName"})
            .fieldSetMapper(new BeanWrapperFieldSetMapper() {{
                setTargetType(Employee.class);
            }})
            .build();
    }
    @Bean
    public EmployeeItemProcessor processor() {
        return new EmployeeItemProcessor();
    }
    @Bean
    public JdbcBatchItemWriter writer(DataSource dataSource) {
        return new JdbcBatchItemWriterBuilder()
            .itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
            .sql("INSERT INTO employee (first_name, last_name) VALUES (:firstName, :lastName)")
            .dataSource(dataSource)
            .build();
    }
    // end::readerwriterprocessor[]
    // tag::jobstep[]
    @Bean
    public Job importUserJob(JobCompletionNotificationListener listener, Step step1) {
        return jobBuilderFactory.get("importUserJob")
            .incrementer(new RunIdIncrementer())
            .listener(listener)
            .flow(step1)
            .end()
            .build();
    }
    @Bean
    public Step step1(JdbcBatchItemWriter writer) {
        return stepBuilderFactory.get("step1")
            .<Employee, Employee> chunk(10)
            .reader(reader())
            .processor(processor())
            .writer(writer)
            .build();
    }
    // end::jobstep[]
}

@EnableBatchProcessing annotation is used for enabling batch processing.
JobBuilderFactory is the factory which is used for building a job.
StepBuilderFactory is used for step creation.
The method step1() has a property chunk(). This is the property used for chunking the input into a defined size. For us, the size is 10.

4.6 Item Processor

Item processor is an interface which will be responsible for processing the data. We will implement the interface in EmployeeItemProcessor.java. The java class will look as shown below.

EmployeeItemProcessor.java

package com.batch.processor;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.batch.item.ItemProcessor;
import com.batch.Employee;
public class EmployeeItemProcessor implements ItemProcessor<Employee, Employee> {
    private static final Logger log = LoggerFactory.getLogger(EmployeeItemProcessor.class);
    @Override
    public Employee process(Employee emp) throws Exception {
        final String firstName = emp.getFirstName().toUpperCase();
        final String lastName = emp.getLastName().toUpperCase();
        final Employee transformedEmployee = new Employee(firstName, lastName);
        log.info("Converting (" + emp + ") into (" + transformedEmployee + ")");
        return transformedEmployee;
    }
}

In the process() method we will be getting the data and we will be transforming it into the uppercase name.

4.7 JobExecutionSupportListener class

JobExecutionListenerSupport is the interface that will notify when the job is completed. As part of the interface, we have afterJob method. This method is used to post the completion of the job.

JobCompletionNotificationListener.java

package com.batch.config;
import java.util.List;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.batch.core.BatchStatus;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.listener.JobExecutionListenerSupport;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.jdbc.core.RowMapper;
import org.springframework.stereotype.Component;
import com.batch.Employee;
@Component
public class JobCompletionNotificationListener extends JobExecutionListenerSupport {
	private static final Logger log = LoggerFactory.getLogger(JobCompletionNotificationListener.class);
	private final JdbcTemplate jdbcTemplate;
	@Autowired
	public JobCompletionNotificationListener(JdbcTemplate jdbcTemplate) {
		this.jdbcTemplate = jdbcTemplate;
	}
	@Override
	public void afterJob(JobExecution jobExecution) {
		RowMapper rowMapper = (rs, rowNum) -> {
			Employee e = new Employee();
			e.setFirstName(rs.getString(1));
			e.setLastName(rs.getString(2));
			return e;
		};
		if(jobExecution.getStatus() == BatchStatus.COMPLETED) {
			log.info("!!! JOB FINISHED! Time to verify the results");
		List empList= jdbcTemplate.query("SELECT first_name, last_name FROM employee",rowMapper);
		log.info("Size of List "+empList.size());
		for (Employee emp: empList) {
			log.info("Found: "+emp.getFirstName()+" "+emp.getLastName());
			
		}
		}
	}
}

In this method, we are getting the data from the database post completion of the job and we are printing the result on the console to verify the processing that was performed on the data.

4.8 Application class

We will create an application class which will contain the main method responsible for triggering the java batch program. The class will look as shown below.

Application.java

package com.batch;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class Application {
    public static void main(String[] args) throws Exception {
        SpringApplication.run(Application.class, args);
    }
}

@SpringBootApplication is the annotation used for specifying a program as a spring boot program.

5. Output

Let’s execute the application as a Java application. We will get the following output on the console.

The output of JavaBatch program

The workflow of the batch program is very clearly available in the output. The Job starts with importUserJob, then step-1 execution starts where it converts the read data into uppercase.

Post-processing of step, we can see the uppercase result on the console.

6. Summary

In this tutorial, we learnt the following things:

  1. Java batch contains Jobs which can contain multiple steps.
  2. Every step is a combination of reading, process, write.
  3. We can chunk out the data into different size for processing.

7. Download the Eclipse project

This was a tutorial for JavaBatch with SpringBoot.

You can download the full source code of this example here: JavaBatch.zip
Exit mobile version