Spring Batch Technology Snapshot

Cloud-ready, scalable Java platform batch processing solution. Spring Batch is a comprehensive, lightweight batch framework designed to enable the development of robust batch applications vital for the daily operations of enterprise systems. Spring Batch’s layered architecture supports the ease of use for end-user developers and leverages existing Java / Java EE tools and frameworks.

Spring Batch’s layered architecture supports the ease of use for end-user developers, making it simple to leverage advanced enterprise services when necessary. An open source solution, Spring Batch allows advanced customization to your systems, including the inclusion of batch scheduling and Admin Console security features.

Spring Batch has removed a lot of the hassle related to solving some of the technical issues surrounding enterprise batch processing.

Benefits

  • Provides functionality for processing large volumes of data with prebuilt solutions to read and write from a multitude of data sources.
  • Since the foundation of Spring Batch is based upon the Spring framework, developers receive all of the benefits of Spring, such as dependency injection and bean management based upon simple POJOs.
  • The majority of the technical aspects surrounding the creation of batch applications have been solved, resulting in the developer instead spending more time solving business needs.
  • Leverages existing Java / Java EE tools and frameworks. For example: by leveraging the additional functionality of Spring Integration, you can further increase the scalability of more distributed processes.
  • As an open source solution, Spring Batch allows advanced customization to your systems.

Features Snapshot:

  • Repeat Operations: an abstraction for grouping repeated operations together and moving the iteration logic into the framework
  • Retry Operations: an abstraction for automatic retries
  • Execution contexts at both the Job and Step level for sharing information between components
  • Late binding of environment properties, job parameters and execution context values into a Step when it starts
  • Persistence of Job meta data for management and reporting purposes that record stats for every component of the Job
  • Remote chunking of steps
  • Configurable exception handling strategies allowing fault tolerance and record skipping. Concurrent execution of chunks
  • Partitioning: steps execute concurrently and optionally in separate processes
  • OSGi support for deploying the Spring Batch framework as a set of OSGi services. Deploy individual jobs for groups of jobs as additional bundles that depend on the core
  • Non-sequential models for Job control configuration (branching and decision support of Step flow)

For a high-level overview, please see tutorial Introducing Spring Batch.

Architecture

Spring Batch was designed as a three layered architecture that consists of the Batch Application, Batch Core and Batch Infrastructure.

The Batch Application layer contains all of batch jobs and custom code written by a developer that will be implementing job processes using Spring Batch.

The Batch Core layer implements all of the necessary runtime classes needed to launch, control and record statistics about batch jobs.

The Batch Infrastructure contains the common readers, writers and services used by both application developers creating jobs and the core batch framework itself.

General Batch Principles and Guidelines

This section is a copy of the official documentation: 

  • A batch architecture typically affects on-line architecture and vice versa. Design with both architectures and environments in mind using common building blocks when possible.
  • Simplify as much as possible and avoid building complex logical structures in single batch applications.
  • Process data as close to where the data physically resides as possible or vice versa (i.e., keep your data where your processing occurs).
  • Minimize system resource use, especially I/O. Perform as many operations as possible in internal memory.
  • Review application I/O (analyze SQL statements) to ensure that unnecessary physical I/O is avoided. In particular, the following four common flaws need to be looked for:
    • Reading data for every transaction when the data could be read once and kept cached or in the working storage;
    • Rereading data for a transaction where the data was read earlier in the same transaction;
    • Causing unnecessary table or index scans;
    • Not specifying key values in the WHERE clause of an SQL statement.
  • Do not do things twice in a batch run. For instance, if you need data summarization for reporting purposes, increment stored totals if possible when data is being initially processed, so your reporting application does not have to reprocess the same data.
  • Allocate enough memory at the beginning of a batch application to avoid time-consuming reallocation during the process.
  • Always assume the worst with regard to data integrity. Insert adequate checks and record validation to maintain data integrity.
  • Implement checksums for internal validation where possible. For example, flat files should have a trailer record telling the total of records in the file and an aggregate of the key fields.
  • Plan and execute stress tests as early as possible in a production-like environment with realistic data volumes.
  • In large batch systems backups can be challenging, especially if the system is running concurrent with on-line on a 24-7 basis. Database backups are typically well taken care of in the on-line design, but file backups should be considered to be just as important. If the system depends on flat files, file backup procedures should not only be in place and documented, but regularly tested as well.

Terminology

Spring Batch uses a simple naming convention that should sound pretty familiar to anyone who has worked with batch processes in general.

A Job is the main component of Spring Batch and encompasses an entire batch process, which is typically made up of a series of Steps.  As part of the Job there are also references to a JobInstance and JobExecution. A JobInstance refers to the concept of a logical job run, for example running the “EndOfDay” job for 2017/07/01. A JobExecution refers to the technical concept of a single attempt to execute the job, for instance the first attempted execution of the “End of Day” job for 2017/07/01.

A Step is an independent process of a batch Job that contains all of the information necessary to define and control a particular phase in the job execution. It’s also at the Step level where you will find the transaction isolation, so keep that in mind when you are designing how your batch process will execute. The Step may contain a single Tasklet that is used for simple processing such as validating job parameters when launching a job, setting up various resources, cleaning up resources, etc. The Tasklet interface has one “execute” method that will be called repeatedly until it either returns RepeatStatus.FINISHED or throws an exception to signal a failure. A more common Step that requires the processing of business rules would use a “Chunk Oriented” implementation that wraps an ItemReader, optional ItemProcessor and ItemWriter for the Step execution. The chunk-oriented approach to batch processing reads and processes data in chunks, for example reading and processing 100 items at a time from a file to load them into a database.  The chunk size is also used as the basis for any transaction commits.

The ItemReader interface has one “read” method that is called multiple times, each call returns one item read from the source and returning null when all input data has been exhausted. The resulting output of the ItemReader is collected into a list that is used to apply the business rules. There are many default implementations of ItemReader that have been provided with Spring Batch such as FlatFileItemReader, JdbcCursorItemReader, JdbcPagingItemReader, JpaPagingItemReader and StoredProcedureItemReader. Due to the extensibility of Spring Batch, you also have the ability to implement your own custom ItemReader if your requirements fall outside of the scope of the existing pre-built implementations.

The ItemProcessor interface has one “process” method that is used for item transformation through applied business rules. Given an input item, which is one item resulting from the output of the ItemReader, apply the business rules and the processor either returns the modified item or a new item for continued processing. Or if continued processing of the item should not take place, the ItemProcessor should return a null value effectively filtering out the item. You also have the ability to chain processors together to apply very complex business rules, with the output of one processor becoming the input of the next processor in the chain and so on. Within the ItemProcessor implementation is where the bulk of the work by the developer will be as this is where most of your business logic will be applied. The resulting output of the ItemProcessor is collected into a list that will then be fed to the ItemWriter for output processing.

The ItemWriter interface has one “write” method and that is called one time for the chunk being processed and is supplied the list items for generic output. There are many default implementations of ItemWriter that have been provided by Spring Batch such as FlatFileItemWriter, JdbcBatchItemWriter, and JpaItemWriter to name a few. Once again you also have ability to implement your own ItemWriter if you find that the provided implementations don’t fit your needs, one such example would be a PDF writer for generating reports.

In addition to the Steps configured within a Job, there are also many points within the execution of a Job that you are able to intercept runtime execution and perform additional processing through several interfaces provided with Spring Batch. Some of the Listeners, the associated methods and any corresponding annotations:

  • JobExecutionListener ( beforeJob, afterJob) @BeforeJob, @AfterJob
  • StepExecutionListener ( beforeStep, afterStep) @BeforeStep, @AfterStep
  • ChunkListener( beforeChunk, afterChunk ) @BeforeChunk, @AfterChunk
  • ItemReadListener (beforeRead, afterRead, onReadError) @BeforeRead, @AfterRead, @OnReadError
  • ItemProcessListener( beforeProcess, afterProcess, onProcessError ) @BeforeProcess, @AfterProcess, @OnProcessError
  • ItemWriteListener ( beforeWrite, afterWrite, onWriteError ) @BeforeWriter, @AfterWrite, @OnWriteError

Spring Batch Tutorial Series: