Spring Batch Technology Snapshot

Cloud-ready, scalable Java platform batch processing solution. Spring Batch is a comprehensive, lightweight batch framework designed to enable the development of robust batch applications vital for the daily operations of enterprise systems. Spring Batch’s layered architecture supports the ease of use for end-user developers and leverages existing Java / Java EE tools and frameworks.

Spring Batch’s layered architecture supports the ease of use for end-user developers, making it simple to leverage advanced enterprise services when necessary. An open source solution, Spring Batch allows advanced customization to your systems, including the inclusion of batch scheduling and Admin Console security features.

Spring Batch has removed a lot of the hassle related to solving some of the technical issues surrounding enterprise batch processing. See all of our tutorial blog posts here.

Benefits

  • Provides functionality for processing large volumes of data with pre-built solutions to read and write from a multitude of data sources.
  • Since the foundation of Spring Batch is based upon the Spring framework, developers receive all of the benefits of Spring, such as dependency injection and bean management based upon simple POJOs.
  • The majority of the technical aspects surrounding the creation of batch applications have been solved, resulting in the developer instead spending more time-solving business needs.
  • Leverages existing Java / Java EE tools and frameworks. For example: by leveraging the additional functionality of Spring Integration, you can further increase the scalability of more distributed processes.
  • As an open source solution, Spring Batch allows advanced customization to your systems.

Features Snapshot:

  • Repeat Operations: an abstraction for grouping repeated operations together and moving the iteration logic into the framework
  • Retry Operations: an abstraction for automatic retries
  • Execution contexts at both the Job and Step level for sharing information between components
  • Late binding of environment properties, job parameters and execution context values into a Step when it starts
  • Persistence of Job meta data for management and reporting purposes that record stats for every component of the Job
  • Remote chunking of steps
  • Configurable exception handling strategies allowing fault tolerance and record skipping. Concurrent execution of chunks
  • Partitioning: steps execute concurrently and optionally in separate processes
  • OSGi support for deploying the Spring Batch framework as a set of OSGi services. Deploy individual jobs for groups of jobs as additional bundles that depend on the core
  • Non-sequential models for Job control configuration (branching and decision support of Step flow)

For a high-level overview, please see tutorial Introducing Spring Batch and our other blog posts.

Usage Scenarios

    • Conversion – Convert transaction records into format required for further processing
    • Validation – Ensure input/output records are correct and consistent
    • Extract – Read from database or input file, select based on rule, and write to output
    • Extract/Update – Read from input, make changes to DB or output file
    • Output/Format – Read input, restructure data to another format, produce output file for transmission to another program or system
    • Reporting – Read large amounts of data, process to produce formatted document for printing, etc.

Architecture

Spring Batch was designed as a three-layered architecture that consists of the Batch Application, Batch Core and Batch Infrastructure.

The Batch Application layer contains all of batch jobs and custom code written by a developer that will be implementing job processes using Spring Batch.

The Batch Core layer implements all of the necessary runtime classes needed to launch, control and record statistics about batch jobs.

The Batch Infrastructure contains the common readers, writers and services used by both application developers creating jobs and the core batch framework itself.

Terminology

Spring Batch uses a simple naming convention that should sound pretty familiar to anyone who has worked with batch processes in general.

A Job is the main component of Spring Batch and encompasses an entire batch process, which is typically made up of a series of Steps.  As part of the Job there are also references to a JobInstance and JobExecution. A JobInstance refers to the concept of a logical job run, for example running the “EndOfDay” job for 2017/07/01. A JobExecution refers to the technical concept of a single attempt to execute the job, for instance the first attempted execution of the “End of Day” job for 2017/07/01.

A Step is an independent process of a batch Job that contains all of the information necessary to define and control a particular phase in the job execution. It’s also at the Step level where you will find the transaction isolation, so keep that in mind when you are designing how your batch process will execute. The Step may contain a single Tasklet that is used for simple processing such as validating job parameters when launching a job, setting up various resources, cleaning up resources, etc. The Tasklet interface has one “execute” method that will be called repeatedly until it either returns RepeatStatus.FINISHED or throws an exception to signal a failure. A more common Step that requires the processing of business rules would use a “Chunk Oriented” implementation that wraps an ItemReader, optional ItemProcessor and ItemWriter for the Step execution. The chunk-oriented approach to batch processing reads and processes data in chunks, for example reading and processing 100 items at a time from a file to load them into a database.  The chunk size is also used as the basis for any transaction commits.

The ItemReader interface has one “read” method that is called multiple times, each call returns one item read from the source and returning null when all input data has been exhausted. The resulting output of the ItemReader is collected into a list that is used to apply the business rules. There are many default implementations of ItemReader that have been provided with Spring Batch such as FlatFileItemReader, JdbcCursorItemReader, JdbcPagingItemReader, JpaPagingItemReader and StoredProcedureItemReader. Due to the extensibility of Spring Batch, you also have the ability to implement your own custom ItemReader if your requirements fall outside of the scope of the existing pre-built implementations.

The ItemProcessor interface has one “process” method that is used for item transformation through applied business rules. Given an input item, which is one item resulting from the output of the ItemReader, apply the business rules and the processor either returns the modified item or a new item for continued processing. Or if continued processing of the item should not take place, the ItemProcessor should return a null value effectively filtering out the item. You also have the ability to chain processors together to apply very complex business rules, with the output of one processor becoming the input of the next processor in the chain and so on. Within the ItemProcessor implementation is where the bulk of the work by the developer will be as this is where most of your business logic will be applied. The resulting output of the ItemProcessor is collected into a list that will then be fed to the ItemWriter for output processing.

The ItemWriter interface has one “write” method and that is called one time for the chunk being processed and is supplied the list items for generic output. There are many default implementations of ItemWriter that have been provided by Spring Batch such as FlatFileItemWriter, JdbcBatchItemWriter, and JpaItemWriter to name a few. Once again you also have ability to implement your own ItemWriter if you find that the provided implementations don’t fit your needs, one such example would be a PDF writer for generating reports.

In addition to the Steps configured within a Job, there are also many points within the execution of a Job that you are able to intercept runtime execution and perform additional processing through several interfaces provided with Spring Batch. Some of the Listeners, the associated methods and any corresponding annotations:

  • JobExecutionListener ( beforeJob, afterJob) @BeforeJob, @AfterJob
  • StepExecutionListener ( beforeStep, afterStep) @BeforeStep, @AfterStep
  • ChunkListener( beforeChunk, afterChunk ) @BeforeChunk, @AfterChunk
  • ItemReadListener (beforeRead, afterRead, onReadError) @BeforeRead, @AfterRead, @OnReadError
  • ItemProcessListener( beforeProcess, afterProcess, onProcessError ) @BeforeProcess, @AfterProcess, @OnProcessError
  • ItemWriteListener ( beforeWrite, afterWrite, onWriteError ) @BeforeWriter, @AfterWrite, @OnWriteError

Spring Batch Tutorial Series: