Java Batch Tutorial

Anand KumarMay 15th, 2018Last Updated: May 15th, 2018

1 6,865 9 minutes read

In today’s world internet has changed the way we live our lives and one of the major reasons for that is the usage of internet for most of the daily chores. This lead to huge amount of data available for processing.

Some of the examples where huge data is involved are processing payslips, bank statements, interest calculation, etc. So imagine if all these jobs had to be done manually, it will take ages to finish these jobs.

How is it done in current age? The answer is Batch Processing.

1. Introduction

Batch processing is performed on bulk data, without manual intervention, and long-running. It might be data or computation-intensive. Batch jobs can be run on predefined schedule or can be initiated on demand. Also, since batch jobs are usually long-running jobs, constant checks and restarting from a certain failure are common features found in batch jobs.

1.1 History of Java Batch Processing

Batch processing for Java Platform was introduced as part of JSR 352 specification, part of the Java EE 7 platform, defines the programming model for batch applications plus a runtime to run and manage batch jobs.

1.2 Architecture of Java Batch

Below diagram shows the basic components for batch processing.

The architecture for batch applications solves batch processing concerns like jobs, steps, repositories, reader processor writer patterns, chunks, checkpoints, parallel processing, flow, retries, sequencing, partitioning, etc.

Let’s understand the flow of architecture.

Job repository contains the jobs that need to be run.
JobLauncher pulls out a job from Job repository.
Every job contains steps. The steps are ItemReader, ItemProcessor and ItemWriter.
Item Reader is the one which read the data.
Item Process is the one that will process the data based on business logic.
The Item writer will write the data back to the defined source.

1.3 Batch Processing Components.

We will now try to understand the batch processing components in detail.

Job: A job comprises the entire batch process. It contains one or more steps. A job is put together using a Job Specification Language (JSL) that specifies the order in which the steps must be executed. In JSR 352, JSL is specified in an XML file known as the job XML file. A job is basically a container holding steps.
Step: A step is a domain object that contains an independent, sequential phase of the job. A step contains all the necessary logic and data to perform the actual processing. The definition of a step is kept vague as per the batch specification because the content of a step is purely application-specific and can be as complex or simple as the developer wants. There are two kinds of steps: chunk and task oriented.
Job Operator: It provides an interface to manage all aspects of job processing, which includes operational commands, such as start, restart, and stop, as well as job repository commands, like retrieval of job and step executions.
Job Repository: It contains information about jobs currently running and historical data about the job. JobOperator provides APIs to access this repository. A JobRepository could be implemented using, a database or a file system.

The following section will help in understanding some common characters of a batch architecture.

1.3 Steps in Job

A Step is an independent phase of a Job. As discussed above, there are two types of steps in a Job. We will try to understand both the types in detail below.

1.3.1 Chunk-Oriented Steps

Chunk steps will read and process one item at a time and group the results into a chunk. The results are then stored when the chunk reaches a predefined size. Chunk-oriented processing makes storing results more efficient when the data set is huge. It contains three parts.

The item reader reads the input one after the other from a data source which can be a database, flat file, log file, etc.
The processor will process the data one by one based on the business logic defined.
A writer writes the data in chunks. Size of the chunk is predefined and is configurable

As part of chunk steps, there are checkpoints which provide information to the framework for the completion of chunks. If there is an error during a chunk processing the process can restart based on the last checkpoint.

1.3.2 Task-Oriented Steps

It executes task other than processing items from a data source. Which includes creation or removal of directories, moving files, creating or deletion of database tables, etc. Task steps are not usually long-running compared to chunk steps.

In a normal scenario, task-oriented steps are used after chunk-oriented steps where there is a clean-up needed. For example, we get log files as an output of an application. The chunk steps are used for processing the data and get meaningful information out of the log files.

The task step is then used to clean-up older log files which are no more needed.

1.3.3 Parallel Processing

Batch jobs often perform expensive computational operations and process large amounts of data. Batch applications can benefit from parallel processing in two scenarios.

Steps that are independent in nature can run on different threads.
Chunk-oriented steps where the processing of each item is independent of the results of processing previous items can run on more than one thread.

Batch processing helps to finish tasks and perform operations faster for huge data.

2. Tools and Technologies

Let us look at the technologies and tool used for building the program.

Eclipse Oxygen.2 Release (4.7.2)
Java – version 9.0.4
Gradle– 4.3
Spring boot – 2.0.1-Release
HSQL Database

3. Project Structure

The project structure will look as shown in the image below.

The above project structure is using Gradle. This project can also be created using maven and the build.gralde will get replaced with pom.xml file. The structure of the project will defer slightly with the usage of Maven for the build.

4. An objective of the Program

As part of the program, we will try to create a simple java batch application using spring boot. This application will perform the following tasks.

Read: – Read employee data from a CSV file.
Process the data: – Convert the employee data into all upper case.
Write: – Write the processed employee data back in the database.

4.1 Gradle build

We are using Gradle for the build as part of the program. The build.gradle file will look as shown below.

build.gradle

buildscript {
    repositories {
        mavenCentral()
    }
    dependencies {
        classpath("org.springframework.boot:spring-boot-gradle-plugin:2.0.1.RELEASE")
    }
}

apply plugin: 'java'
apply plugin: 'eclipse'
apply plugin: 'idea'
apply plugin: 'org.springframework.boot'
apply plugin: 'io.spring.dependency-management'

bootJar {
    baseName = 'java-batch'
    version =  '1.0'
}

repositories {
    mavenCentral()
}

sourceCompatibility = 1.8
targetCompatibility = 1.8

dependencies {
    compile("org.springframework.boot:spring-boot-starter-batch")
    compile("org.hsqldb:hsqldb")
    testCompile("junit:junit")
}

In the above build.gradle file apply plugin: 'java' tells us the plugin that needs to be set. For us, it is Java plugin.
repositories{} lets us know the repository from which the dependency should be pulled. We have chosen mavenCentral to pull the dependency jars. We can use jcenter also for pulling the respective dependency jars.

dependencies {} tag is used to provide necessary jar file details that should be pulled for the project. apply plugin: 'org.springframework.boot' this plugin is used for specifying a spring-boot project. boot jar{} will specify the properties of the jar that will get generated from the build.

4.2 Sample data file

In order to provide data for the read phase, we will use a CSV file containing employee data.

The file will look as shown below.

Sample CSV file

John,Foster
Joe,Toy
Justin,Taylor
Jane,Clark
John,Steve

The sample data file contains the first and last name of the employee. We will use the same data for processing and then insertion in the database.

4.3 SQL scripts

We are using HSQL database which is a memory based database. The script will look as shown below.

SQL script

DROP TABLE employee IF EXISTS;

CREATE TABLE employee  (
    person_id BIGINT IDENTITY NOT NULL PRIMARY KEY,
    first_name VARCHAR(20),
    last_name VARCHAR(20)
);

Spring Boot runs schema-@@platform@@.sql automatically when it starts. -all is the default for all platforms. So the table creation will happen on its own when the application starts and it will be available until the application is up and running.

4.4 Model Class

We are going to create an Employee.java class as the model class. The class will look as shown below.

Model Class for the Program

package com.batch;

public class Employee {
	    private String lastName;
	    private String firstName;

	    public Employee() {
	    }

	    public Employee(String firstName, String lastName) {
	        this.firstName = firstName;
	        this.lastName = lastName;
	    }

	    public void setFirstName(String firstName) {
	        this.firstName = firstName;
	    }

	    public String getFirstName() {
	        return firstName;
	    }

	    public String getLastName() {
	        return lastName;
	    }

	    public void setLastName(String lastName) {
	        this.lastName = lastName;
	    }

	    @Override
	    public String toString() {
	        return "firstName: " + firstName + ", lastName: " + lastName;
	    }

	}

@Override is used for overriding the default implementation of the toString() method.

4.5 Configuration Class

We will create a BatchConfiguration.java class which will be the configuration class for batch processing. The java file will look as shown below.

BatchConfiguration.java

package com.batch.config;

import javax.sql.DataSource;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecutionListener;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider;
import org.springframework.batch.item.database.JdbcBatchItemWriter;
import org.springframework.batch.item.database.builder.JdbcBatchItemWriterBuilder;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.builder.FlatFileItemReaderBuilder;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;
import org.springframework.jdbc.core.JdbcTemplate;

import com.batch.Employee;
import com.batch.processor.EmployeeItemProcessor;

@Configuration
@EnableBatchProcessing
public class BatchConfiguration {

    @Autowired
    public JobBuilderFactory jobBuilderFactory;

    @Autowired
    public StepBuilderFactory stepBuilderFactory;

    // tag::readerwriterprocessor[]
    @Bean
    public FlatFileItemReader reader() {
        return new FlatFileItemReaderBuilder()
            .name("EmployeeItemReader")
            .resource(new ClassPathResource("sample-data.csv"))
            .delimited()
            .names(new String[]{"firstName", "lastName"})
            .fieldSetMapper(new BeanWrapperFieldSetMapper() {{
                setTargetType(Employee.class);
            }})
            .build();
    }

    @Bean
    public EmployeeItemProcessor processor() {
        return new EmployeeItemProcessor();
    }

    @Bean
    public JdbcBatchItemWriter writer(DataSource dataSource) {
        return new JdbcBatchItemWriterBuilder()
            .itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
            .sql("INSERT INTO employee (first_name, last_name) VALUES (:firstName, :lastName)")
            .dataSource(dataSource)
            .build();
    }
    // end::readerwriterprocessor[]

    // tag::jobstep[]
    @Bean
    public Job importUserJob(JobCompletionNotificationListener listener, Step step1) {
        return jobBuilderFactory.get("importUserJob")
            .incrementer(new RunIdIncrementer())
            .listener(listener)
            .flow(step1)
            .end()
            .build();
    }

    @Bean
    public Step step1(JdbcBatchItemWriter writer) {
        return stepBuilderFactory.get("step1")
            .<Employee, Employee> chunk(10)
            .reader(reader())
            .processor(processor())
            .writer(writer)
            .build();
    }
    // end::jobstep[]
}

@EnableBatchProcessing annotation is used for enabling batch processing.
JobBuilderFactory is the factory which is used for building a job.
StepBuilderFactory is used for step creation.
The method step1() has a property chunk(). This is the property used for chunking the input into a defined size. For us, the size is 10.

4.6 Item Processor

Item processor is an interface which will be responsible for processing the data. We will implement the interface in EmployeeItemProcessor.java. The java class will look as shown below.

EmployeeItemProcessor.java

package com.batch.processor;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.batch.item.ItemProcessor;

import com.batch.Employee;

public class EmployeeItemProcessor implements ItemProcessor<Employee, Employee> {

    private static final Logger log = LoggerFactory.getLogger(EmployeeItemProcessor.class);

    @Override
    public Employee process(Employee emp) throws Exception {
        final String firstName = emp.getFirstName().toUpperCase();
        final String lastName = emp.getLastName().toUpperCase();

        final Employee transformedEmployee = new Employee(firstName, lastName);

        log.info("Converting (" + emp + ") into (" + transformedEmployee + ")");

        return transformedEmployee;
    }

}

In the process() method we will be getting the data and we will be transforming it into the uppercase name.

4.7 JobExecutionSupportListener class

JobExecutionListenerSupport is the interface that will notify when the job is completed. As part of the interface, we have afterJob method. This method is used to post the completion of the job.

JobCompletionNotificationListener.java

package com.batch.config;
import java.util.List;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.batch.core.BatchStatus;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.listener.JobExecutionListenerSupport;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.jdbc.core.RowMapper;
import org.springframework.stereotype.Component;

import com.batch.Employee;

@Component
public class JobCompletionNotificationListener extends JobExecutionListenerSupport {

	private static final Logger log = LoggerFactory.getLogger(JobCompletionNotificationListener.class);

	private final JdbcTemplate jdbcTemplate;

	@Autowired
	public JobCompletionNotificationListener(JdbcTemplate jdbcTemplate) {
		this.jdbcTemplate = jdbcTemplate;
	}

	@Override
	public void afterJob(JobExecution jobExecution) {
		RowMapper rowMapper = (rs, rowNum) -> {

			Employee e = new Employee();

			e.setFirstName(rs.getString(1));
			e.setLastName(rs.getString(2));
			return e;
		};
		if(jobExecution.getStatus() == BatchStatus.COMPLETED) {
			log.info("!!! JOB FINISHED! Time to verify the results");

		List empList= jdbcTemplate.query("SELECT first_name, last_name FROM employee",rowMapper);
		log.info("Size of List "+empList.size());
		for (Employee emp: empList) {
			log.info("Found: "+emp.getFirstName()+" "+emp.getLastName());
			
		}
		}
	}
}

In this method, we are getting the data from the database post completion of the job and we are printing the result on the console to verify the processing that was performed on the data.

4.8 Application class

We will create an application class which will contain the main method responsible for triggering the java batch program. The class will look as shown below.

Application.java

package com.batch;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class Application {

    public static void main(String[] args) throws Exception {
        SpringApplication.run(Application.class, args);
    }
}

@SpringBootApplication is the annotation used for specifying a program as a spring boot program.

5. Output

Let’s execute the application as a Java application. We will get the following output on the console.

The workflow of the batch program is very clearly available in the output. The Job starts with importUserJob, then step-1 execution starts where it converts the read data into uppercase.

Post-processing of step, we can see the uppercase result on the console.

6. Summary

In this tutorial, we learnt the following things:

Java batch contains Jobs which can contain multiple steps.
Every step is a combination of reading, process, write.
We can chunk out the data into different size for processing.

7. Download the Eclipse project

This was a tutorial for JavaBatch with SpringBoot.

You can download the full source code of this example here: JavaBatch.zip

Java Batch Tutorial

1. Introduction

1.1 History of Java Batch Processing

1.2 Architecture of Java Batch

1.3 Batch Processing Components.

1.3 Steps in Job

1.3.1 Chunk-Oriented Steps

1.3.2 Task-Oriented Steps

1.3.3 Parallel Processing

2. Tools and Technologies

3. Project Structure

4. An objective of the Program

4.1 Gradle build

4.2 Sample data file

4.3 SQL scripts

4.4 Model Class

4.5 Configuration Class

4.6 Item Processor

4.7 JobExecutionSupportListener class

4.8 Application class

5. Output

6. Summary

7. Download the Eclipse project

Thank you!

Anand Kumar

Thank you!

1. Introduction

1.1 History of Java Batch Processing

1.2 Architecture of Java Batch

1.3 Batch Processing Components.

1.3 Steps in Job

1.3.1 Chunk-Oriented Steps

1.3.2 Task-Oriented Steps

1.3.3 Parallel Processing

2. Tools and Technologies

3. Project Structure

4. An objective of the Program

4.1 Gradle build

4.2 Sample data file

4.3 SQL scripts

4.4 Model Class

4.5 Configuration Class

4.6 Item Processor

4.7 JobExecutionSupportListener class

4.8 Application class

5. Output

6. Summary

7. Download the Eclipse project

Thank you!

Related Articles

Thank you!