One of my most recent projects involved helping a client move many decades of code from a mainframe environment to a distributed Java web environment. The client had engaged another company to actually transform the mainframe code to Java, and our team was tasked with making it all actually work.
One of the major areas we had to deal with was the transition of all of the batch processes. Of course, Spring Batch came to our rescue for most of the work, and was an easy choice as we were already using Spring Boot to wrapper the converted applications.
The most challenging part of the entire project was that the client did not want to move everything at once in a Big Bang, but rather a few programs as a time. This meant that some programs would be running in the Java environment while others remained on the mainframe.
In this blog, I discuss three data challenges we encountered in the transition of an enterprise mainframe to Java web application with Spring Batch, how we overcame them, and tips to keep in mind going forward when in similar migration situations.
The Data Challenges
Sharing the data was a challenge, but one that they had expected. One that we had not foreseen (but probably should have) was how to move the files that all the batch processes use back and forth between environments.
I must admit that we all thought that we would just FTP the files down from the mainframe, read them using the converted code, then FTP the resultant files back up. And in limited testing, that seemed to work just fine.
Ah, the perils of “limited testing.”
The first thing we noticed was that the files we were creating and sending up to the mainframe were being truncated. Not the file length, but the length of each line.
It turned out that the mainframe defined a record set for each file it created, and part of that definition was the length of each record. If we didn’t tell it what the record length was when we sent it a file, it defaulted to 128 bytes.
I have done quite a bit of work involving FTPing files around, but it had always been straightforward. Create an FTPClient, set the connection parameters, open the connection, and move files back and forth.
In the case of talking to the mainframe, there was more information I had to provide. Luckily for me the client had a helpful mainframe person who was able to inform me that in their FTP code, they had to set the LRECL to the proper length. Google to the rescue, and I was introduced to the
sendSiteCommand method on Apache’s FTPClient. This allowed me to specify the record length that should be used when storeFile creates a file on the mainframe. Yay, success! The entirety of our file was now showing up on the mainframe.
As we ported more and more batch programs into the Java environment, it became apparent that things were processing differently there than they had on the mainframe.
With a bit of digging, we realized that code that was reading the flat files (and had been ported from the mainframe code), was reading a number of bytes for each record to process. That code was not accounting for a line terminator because in the mainframe environment, there hadn’t been one. However, when we FTP’d the file down as an ASCII type (which seemed reasonable, as the files only contain ASCII data), each record contained a line terminator (which varied depending on the target environment, e.g. Windows or Unix).
It turned out that the mainframe inserted a line terminator as the end of each record as part of its conversion from EBCDIC to ASCII. Since the converted code was reading the strict number of bytes in each record, this caused the subsequent records to be offset by a byte or two, depending on the environment the Java was running in. This wasn’t noticed in the “limited testing” because none of the testers had enough business knowledge to adequately determine that the process had completed correctly. We all had to rely on the visible output of the process (which had it’s own limitations). It was not until some proted processes created files used by other ported processes that the problem was discovered.
The first attempt to fix the problem was to change to using a binary transfer rather than an ASCII one. That way the FTP process would not insert the undesirable line terminator. That quickly exposed the problem that binary transfers do not automatically convert from the EBCDIC format of the mainframe world into the ASCII one of Java. We quickly abandoned that approach, as there are several different EBCDIC conversion tables, and since the mainframe already knew how to convert to and from the one it used, duplicating that code would be counter productive.
Our second attempt involved changing to using a
BufferedReader (instead of a basic
InputStream), so we could read the expected number of bytes, and then read the next ones for a line terminator and handle it appropriately. That worked fine as long as we were in a single target environment. But trying to support either Windows or Unix on the fly was again, counterproductive.
Our final attempt at this problem was to change from reading a specific number of bytes to reading a line at a time. Since we had already changed the code to use a buffered reader, this was a simple and straightforward change. We were concerned that this would fail if a file was received that didn’t have a line delimiter, but consultation with the client showed that this was an unlikely scenario. Yay, success! We are now reading records of the proper length and starting point.
Our final challenge came when we first had ported code that was writing a file directly, rather than using a Spring Batch provided writer.
The ported code used a file with a specified record length of 250 characters, and wrote each line out as a complete record. The problem was that it didn’t include a line terminator by default, as on the mainframe it had not needed one. We had assumed that since we were specifying the record length when we put the file onto the mainframe, that it would break the file up into the appropriate records. Alas, that was not the case. When we examined the file we had sent, it turned out to have a single record, truncated at 250 character, rather than the several thousand records we expected.
Our solution was to change the ported code to use a
PrintWriter rather than a basic
OutputStream, and use the provided
println method to output the data. That way, the proper line terminator for the running environment is used, and the EBCDIC translation works properly. Yay, success! We now have a file with lots of 250 character records instead of just one.
None of these challenges were particularly difficult to overcome, or even all that unexpected with a bit more knowledge about the mainframe environment we were dealing with. The lessons to be learned are:
- There is no such thing as too much business knowledge for a developer.
- Talk to your compatriots who are familiar with the other environments your development will be interacting with.
- Just because something has always “just worked” before, does not mean it will this time.
As the years pass, fewer and fewer of us will have to deal with the mainframe environments that were so prevalent in earlier decades. But there is still a lot of business being conducted in those environments (we at Keyhole help companies modernize every day), and we ignore them at our professional peril.