This post contains an introduction to and demo of the mythical language that every programmer seems to know about but doesn’t really know: COBOL.
By learning about COBOL, you learn about basic concepts in computing in general, and how all of these languages are really doing a lot of the same activities underneath.
-Everything old is new again – Stephen King, The Colorado Kid
Before May of this year, I only saw it exactly one time. It was in 1997 (the early days of the public internet) while I was interning at a major retailer that has only gotten major-er since. My coworkers were showing me around fields of fraying, gray cubicles that were adequate, but mostly served as a reminder that nobody makes 12-digit revenues by spending money on furniture.
At one stop, we spoke to a middle-aged gentleman wearing a button-down plaid shirt and a warm smile who seemed more than happy to turn away from his screen. I caught a glimpse over his shoulder, though, and what I saw was burned into my brain.
From context clues, I knew what it had to be. He was a programmer after all, and it was right there above his commodity plastic keyboard on the 19-inch CRT in all its terminal-emulator glory. It was arranged in neat little lines with lots of space, and it was in all caps for God’s sake.
It was COBOL.
1000125 MOVE REGSALESYTD TO ACC-YEAR
A First Encounter
I had heard of COBOL, of course, and I’d been advised of what to do if I ever got close to it: run. If you stared at it too long, you would surely go mad and never be able to write an email without shouting again.
Fortunately, I was able to retreat safely back to my desk to the completely sane and un-COBOL-like language my group was using, Visual Basic.
But since then, COBOL has never really been that far away. I’ve worked with financial systems, school districts, and government services, all of whom use mainframes that, more than likely, have at least a little bit of COBOL in them. But to me, the back-ends of these systems and the people who operate them existed in a lost world, where all the screens were full-color as long as you only needed green.
Now, with a lot of those aforementioned plaid-shirted guys retiring at the same time, their systems are getting hammered more than ever, and COBOL is peeking out into the limelight once again.
This time, I decided to risk my shift key and walk right in.
Note: Just a reminder that I have no commercial experience writing COBOL. The following is not so much a tutorial as it is some notes about where my curiosity about a vintage language has led me.
Working with COBOL requires a suitable development environment, and figuring that out could be a bigger project than the language itself. If you have a mainframe handy, use that, or you could try an emulator like Hercules. I used GnuCobol, which installed nicely on my MacBook.
I’ll include some links to other useful resources I found on my journey at the end of the post.
Looking for a Familiar Face
000100 IDENTIFICATION DIVISION. 000200 PROGRAM-ID. DEMO-LOOP. 000300 000400 DATA DIVISION. 000500 WORKING-STORAGE SECTION. 000600 01 X PIC 9(4) VALUE IS 0. 000800 PROCEDURE DIVISION. 000900 SHOW-NAME. 000900 DISPLAY ‘my name is Kevin’ X. 001000 ADD 1 TO X GIVING X. 001000 GOTO SHOW-NAME. 001100 STOP RUN.
Notice that it doesn’t look like any of the languages we use today. A strangely avant-garde haiku perhaps, but not C#. However, is it really all that foreign?
Let’s reach back further.
Between about 1981 and 1984, I couldn’t walk into a discount store without making a beeline to the electronics section, finding a Vic-20 or whatever there was that day, and typing in the following BASIC gem:
10 LET X=0 20 PRINT “My name is Kevin”+X 30 LET X=X+1 40 GOTO 10
Hm. Just like the COBOL, there’s a definition for
DISPLAY), some addition, and finally, computer science’s favorite control statement:
Ok, I can do this.
If you’re unlike me and didn’t grow up writing applesoft/tandy/commodore/TI/Sinclair BASIC, then jump on eBay, grab a vintage machine in working order, and get yourself up to speed. Go ahead, we’ll wait!
A Quick History
I want to be careful NOT to imply that there is any family relation between BASIC and COBOL. From my moderately shallow dive into COBOL history, I’d say they are less like siblings and more like two kids. Both born in the ’60s who grew up through the development of structured programming and the growth of an industry.
While BASIC started in academia and stayed there, COBOL was always all business. Designed by the venerable Grace Hopper before the 1960s even started, its killer feature was that it concentrated on the things businesses needed: simple number or text inputs, data records, and nearly-English syntax that a secretary (or even an executive) could write.
Some of those qualities worked out better than others, but at first, it was a bit of an unstructured mess. Over the years, however, it absorbed what the industry was learning about structured programming and improved with major releases every several years that kept COBOL from becoming obsolete.
It also served as an important lesson: making a computer language look like English doesn’t necessarily do you any good.
Is There Going to Be Any More Code in This Thing?
I’ll admit it, I wrote that first chunk of COBOL to deliberately look as unfamiliar as possible. It was similar to what early COBOL looked like because early COBOL was entered on punch cards.
Besides the line numbers, certain parts of code had to be written in certain columns to be understood correctly. Later versions removed a lot of those restrictions as punch cards faded from use. From here on out, the COBOL code we discuss will be of the newer variety.
(Interesting Note: The GnuCobol compiler uses the original column-restricted syntax by default, but you can use the free switch to dispense with it.)
A Demo Project
It always help me to have some sort of project to practice and demonstrate on, so let’s find one by jumping over to a parallel universe. In this case, let’s say we hop over to a universe where mainframes developed just like they did here, but the simple 4-function calculator never did.
We just founded the newest Silicon Valley startup that promises to let your trusty home mainframe take care of all your arithmetical needs, er, mostly. Some nice-to-haves, like decimal points and negative numbers, will come out in version 2, but we’ll make up for it with one amazing feature — you can save all your math operations to a data tape (or file) to keep for later!
It’s so high-tech, I just can’t even.
I know what you’re thinking, “Code already!” Don’t worry, I’m about to start.
IDENTIFICATION DIVISION. PROGRAM-ID. UBER CALCULATOR. AUTHOR. Kevin Roper. INSTALLATION. KEYHOLE HQ. DATE-WRITTEN. 05/25/20.
COBOL is formal. There are a number of sections that a program has to be divided into. Depending on what your program is doing, you might not need all of them, but in any case, the first thing to do is to identify our program.
In any language, it’s always handy to do this. COBOL lets you call programs from other programs, so the
PROGRAM-ID can actually be useful.
ENVIRONMENT DIVISION. INPUT-OUTPUT SECTION. FILE-CONTROL. SELECT CALCULATOR-TAPE ASSIGN TO “CALCULATOR-TAPE.DAT” ORGANIZATION IS LINE SEQUENTIAL. DATA DIVISION. FILE SECTION. FD CALCULATOR-TAPE. 01 CALCULATION-RECORD. 02 F-OPERAND-A PIC 9(8). 02 F-OPERAND-B PIC 9(8). 02 F-OPERATOR PIC A. 02 F-ANSWER PIC 9(16). 02 F-ERROR-MESSAGE PIC X(30). 02 F-META-DATA. 05 DATE-OF-CALCULATION. 10 CALCULATION-YEAR PIC 9(2). 10 CALCULATION-MONTH PIC 9(2). 10 CALCULATION-DAY PIC 9(2). 88 END-OF-CALC-FILE VALUE HIGH-VALUES. WORKING-STORAGE SECTION. 01 ERROR-MESSAGE PIC X(30). 01 OPERAND-A PIC 9(8). 01 OPERAND-B PIC 9(8). 01 OPERATOR PIC A. 01 ANSWER PIC 9(16). 01 ERROR-MESSAGES. 05 CLEARED-MESSAGE PIC A(30) VALUE IS “ “. 05 OVERFLOW-MESSAGE PIC A(30) VALUE IS “ERROR: OVERFLOW “. 05 SIGN-MESSAGE PIC A(30) VALUE IS “ERROR: LESS THAN ZERO “. 05 MISC-MESSAGE PIC A(30) VALUE IS “ERROR: OTHER ERROR “.
After that little formality, COBOL requires you to plan ahead(!) and declare information about the resources you’re going to use.
There are a number of things that might belong here, but for our project, we need to tell the compiler about a few things:
We’re saving data, so we have to declare where our datafile lives. Exactly how you do this varies depending on the type of system you’re on. With my MacBook doing its best terminal impression, we have it easy and can just declare a data file,
CALCULATOR-TAPE.DAT. The data will take the form of sequential records, which is about as simple as it gets — just one record after another.
(Ok, there’s a slight cheat here using
LINE SEQUENTIAL, which actually puts the individual records on different lines in the file. It’s going to be easier to look at later on, but if you were saving to one of those giant reel-to-reel data tapes, you probably wouldn’t use it.)
While it has many benefits, the sequential simplicity does have a price – you have to read all the records in order instead of just grabbing the one you want. However, COBOL is able to have indexed records that you can randomly access by key and even databases, so maybe that’ll be in version 3. The possibilities are endless.
COBOL must be informed of all the data structures that it’s going to need in advance. In our case, this means first defining the structure of our fixed-length data record.
The sequential records we’re eventually going to write are not going to have any fancy control characters or XML formatting to delineate fields (fun fact: if our parallel universe is past 2014, our COBOL could handle XML). Instead, we simply tell COBOL how many characters are in each field, and it counts.
It’s here that COBOL starts looking alien again. This is well before you could write
int myNumber and be done with it. COBOL defines variables using a “picture clause” along with some special coding that helps define deeper structures and special types of data.
I’m going out of order a little bit here, but bear with me and take a look at this line:
02 F-OPERAND-A PIC 9(8).
This is a very simple variable declaration. The variable name is
F-OPERAND-A, and the type is defined by
This bit that begins with PIC is the picture clause. It specifies the data tersely, in a way that makes you think that the one guy who didn’t completely buy into the “conversational English” idea designed this part. The “9” indicates a purely numerical — integer — value, with a length of 8 digits.
As to why this is called a picture clause, I’ll leave that as an exercise to the reader. Because I don’t know.
(Yes, of course, I tried to find out, but with little success. Most references just seemed to accept the term, and some other writers admitted they couldn’t dig up the etymology either. I did find this book, which may or may not be the beginning of a relevant rabbit hole.)
02 at the beginning indicates the level of this item in the overall data hierarchy of the program — it’s kind of like a struct. To further explain this concept, let’s zoom out a bit.
01 CALCULATION-RECORD. 02 F-OPERAND-A PIC 9(8). 02 F-OPERAND-B PIC 9(8). 02 F-OPERATOR PIC A. 02 F-ANSWER PIC 9(16). 02 F-ERROR-MESSAGE PIC X(30).
We could say that
CALCULATION-RECORD is the name of the struct at level 01. The variables underneath belong to that struct because they have a data one level deeper. If you really want to, you can go all the way up to level 49.
You can see some other data types there in that example too.
A means alphabetical data only, and
X is alphanumeric. In any case, you have to specify how long the data will be so that the right amount of space gets reserved.
In our file, it also lets COBOL count the right amount of characters so that it knows where one field ends and the next begins.
02 F-META-DATA. 05 DATE-OF-CALCULATION. 10 CALCULATION-YEAR PIC 9(2). 10 CALCULATION-MONTH PIC 9(2). 10 CALCULATION-DAY PIC 9(2).
Side note: Lest you think COBOL is completely rigid, you can also just repeat the type specifier the desired number of times. So the day field we’ve written as
PIC 9(2) could also be
Anyways, at the end of that, we define a slightly deeper structure to hold a date that we want to write to the record. We’ve got two numerical digits for each part of the date all nested under a variable called
We could, if we wanted to, add another bit of meta-data to this section, say the name of the user running the program. To keep the overall struct going, it would end up looking like this:
05 USER-NAME PIC A(10).
The cool part about this is that you can choose at which level you want to access this data later on. If I just want the year, I would reference
CALCULATION-YEAR. If I want the whole date, I would say
DATE-OF-CALCULATION, and it would give me all 6 digits, no manipulation required.
After the file record declaration, we still need to tell COBOL what variables we’ll need for the execution of our program’s logic. That happens in the working storage section, which might be starting to look a little familiar now.
WORKING-STORAGE SECTION. 01 ERROR-MESSAGE PIC X(30). 01 OPERAND-A PIC 9(8). 01 OPERAND-B PIC 9(8). 01 OPERATOR PIC A. 01 ANSWER PIC 9(16). [/code Final note: These numbers are all integers! Can COBOL do a floating-point? Or what about negative numbers? Of course! There are a couple of varieties of decimal points, but one example would be PIC S9(2)V9(3).
This specifies a
9, which is
(2) digits long, followed by a decimal point,
V, with 3 decimal places after it,
Whew. For that matter, you could initialize the whole thing like this:
PIC S9(2)V9(3) VALUE IS -10.125.
Just like you’d write it in English!
SCREEN SECTION. 01 MATH-SCREEN. 05 BACKGROUND-COLOR 1 FOREGROUND-COLOR 7. 05 LINE 1 COLUMN 30 VALUE “INTEGER CALCULATOR v0.1”. 05 LINE 10 COLUMN 5 VALUE “A=ADD S=SUBTRACT M=MULTIPLY D=DIVIDE Q=QUIT”. 05 LINE 3 COLUMN 5 VALUE “OPERAND 1”. 05 LINE 4 COLUMN 5 VALUE “OPERAND 2”. 05 LINE 5 COLUMN 5 VALUE “OPERATOR”. 05 LINE 7 COLUMN 5 VALUE “ANSWER”. 05 SCR-OPERAND-A LINE 3 COLUMN 20 PIC 9(8) TO OPERAND-A. 05 SCR-OPERAND-B LINE 4 COLUMN 20 PIC 9(8) TO OPERAND-B. 05 SCR-OPERATOR LINE 5 COLUMN 20 PIC A TO OPERATOR. 05 SCR-ANSWER LINE 7 COLUMN 20 PIC 9(16) FROM ANSWER. 05 SCR-ERROR LINE 8 COLUMN 20 PIC X(30) FROM ERROR-MESSAGE.
We’re almost to the actual guts of our calculator; there’s just one thing left to declare. Some COBOL programs might just read data off a tape, do something with it, and write it back out again. But our amazing calculator is going to need to take live user input and then provide immediate output.
Making this a reality is somewhat hardware-dependent, and the default behavior of GnuCobol running in a terminal looks pretty much like any shell script would. Thanks to COBOL 2002, though, we can take advantage of the
SCREEN SECTION helps layout a textual input screen that doesn’t just look like a command line, and it uses a lot of the same concepts as the data definitions we did earlier.
Go ahead, let that sink in. You’re never going to be able to write
int x without thinking of this again.
Time for Some Action
PROCEDURE DIVISION. PERFORM MATH-SCREEN-LOOP UNTIL OPERATOR=“Q”. STOP RUN. MATH-SCREEN-LOOP. DISPLAY MATH-SCREEN. ACCEPT MATH-SCREEN. PERFORM CLEAR-ERROR-MESSAGE. PERFORM DO-THE-MATH. DO-THE-MATH. IF OPERATOR=“A” THEN PERFORM DO-ADD. IF OPERATOR=“S” THEN PERFORM DO-SUBTRACT. IF OPERATOR=“M” THEN PERFORM DO-MULTIPLY. IF OPERATOR=“D” THEN PERFORM DO-DIVIDE. PERFORM WRITE-TAPE. DO-ADD. ADD OPERAND-A TO OPERAND-B GIVING ANSWER.
PROCEDURE DIVISION contains the actual logic, similar to the
Main() function in C. In fact, at this point, I’m definitely feeling a C vibe since the header file is the previous declaration section.
I don’t know if there is any direct family lineage here, but you can’t help but think about it. It’s interesting to consider how ideas must have moved throughout the math/computer science community in those days as languages were designed and matured. The basic structures that are taken for granted now were only just getting started then.
For example, this program illustrates some of the basic patterns that we still use today. It starts with a loop —
PERFORM a block of code
UNTIL a condition is true.
In this case, a block of code is a labeled set of lines of COBOL code. Each set is called a paragraph, and each line is referred to as a sentence. (Sentences end with a period, did you notice that? I guess somewhere along the line the semicolon industry must have thrown its weight around and taken over.)
I found it useful to think of these blocks as functions. They don’t take any parameters or return any values, though, and they don’t have to.
So, essentially, it’s not too far off to say that
PERFORM, at least on a practical level.
Now, going back to the sets of code above, we have some
IF THEN (still the same after all these years) statements that use
PERFORM to hand executions to our labeled blocks of code. In turn, these jump to other labeled blocks.
Alright, now let’s fill in the remaining three math functions.
DO-SUBTRACT. IF OPERAND-B IS GREATER THAN OPERAND-A THEN PERFORM SIGN-ERROR ELSE SUBTRACT OPERAND-B FROM OPERAND-A GIVING ANSWER END-IF. DO-MULTIPLY. MULTIPLY OPERAND-A BY OPERAND-B GIVING ANSWER. DO-DIVIDE. DIVIDE OPERAND-A BY OPERAND-B GIVING ANSWER.
The language is really pretty plain at this point, and you could choose to use a more symbolic math syntax (
-, etc.) instead of the spelled-out words if you want.
The assumption seems to be that business math is relatively uncomplicated — if you were trying to do Fourier transforms or something you’d probably be team FORTRAN anyway.
Here are a few more blocks to round out some boring error checking stuff.
CHECK-OVERFLOW. IF ANSWER IS GREATER THAN 9999999999999999 THEN MOVE 1111111111111111 TO ANSWER. SIGN-ERROR. MOVE SIGN-MESSAGE IN ERROR-MESSAGES TO ERROR-MESSAGE. MOVE 8888888888888888 TO ANSWER. DISPLAY SCR-ANSWER. MISC-ERROR. MOVE MISC-MESSAGE IN ERROR-MESSAGES TO ERROR-MESSAGE. MOVE 8888888888888888 TO ANSWER. OVERFLOW-ERROR. MOVE OVERFLOW-MESSAGE IN ERROR-MESSAGES TO ERROR-MESSAGE. MOVE 8888888888888888 TO ANSWER. CLEAR-ERROR-MESSAGE. MOVE CLEARED-MESSAGE IN ERROR-MESSAGES TO ERROR-MESSAGE.
There’s a lot of moving going on! That’s how you set a variable: you
MOVE the value into it, and in doing so, you perpetuate COBOL’s persistent reputation for wordiness.
WRITE-TAPE. OPEN EXTEND CALCULATOR-TAPE MOVE OPERAND-A TO F-OPERAND-A. MOVE OPERAND-B TO F-OPERAND-B. MOVE ANSWER TO F-ANSWER. MOVE OPERATOR TO F-OPERATOR. MOVE ERROR-MESSAGE TO F-ERROR-MESSAGE. MOVE “200101” TO F-META-DATA. WRITE CALCULATION-RECORD. CLOSE CALCULATOR-TAPE.
One last thing! We need to write our operation out to tape. Ha, of course, for me there’s no tape involved — it’s saving to the data file, but I just love saying it that way.
This ought to look fairly familiar, both from a COBOL point of view and a modern language point of view. All we’re doing is
MOVEing values to variables. It’s just that COBOL remembers that we defined these particular variables as part of a file, so it takes care of that for us.
OPEN, WRITE, CLOSE. We all still do it that way.
Learning to Read
I suppose, since we went to all that trouble writing out our calculator tape, we might as well be able to look at it again later! We simply open the data file.
0000012300005420A0000000000005543 200101 7897888800023230S0000000078955658 200101 9999999999999999M9999999800000001 200101 0000343499999999S8888888888888888ERROR: LESS THAN ZERO 200101 3535353400000234D0000000000151083 200101 3535353400000234Q0000000000151083 200101
As ugly as it might be, this is a file you could still import into Excel to this day. The fields are fixed-width, so as long as you know how they were defined in the first place, you can get everything back out neatly again.
Let’s make a new program to do just that. At this point it’s easy. It’s really just the same as before but backward.
IDENTIFICATION DIVISION. PROGRAM-ID. TAPE-READER. AUTHOR. Kevin Roper. INSTALLATION. KEYHOLE HQ. DATE-WRITTEN. 05/25/2020. ENVIRONMENT DIVISION. INPUT-OUTPUT SECTION. FILE-CONTROL. SELECT CALCULATOR-TAPE ASSIGN TO “CALCULATOR-TAPE.DAT” ORGANIZATION IS LINE SEQUENTIAL. DATA DIVISION. FILE SECTION. FD CALCULATOR-TAPE. 01 CALCULATION-RECORD. 88 END-OF-CALC-FILE VALUE HIGH-VALUES. 02 F-OPERAND-A PIC 9(8). 02 F-OPERAND-B PIC 9(8). 02 F-OPERATOR PIC A. 02 F-ANSWER PIC 9(16). 02 F-ERROR-MESSAGE PIC X(30). 02 F-META-DATA. 05 DATE-OF-CALCULATION. 10 CALCULATION-YEAR PIC 9(2). 10 CALCULATION-MONTH PIC 9(2). 10 CALCULATION-DAY PIC 9(2). WORKING-STORAGE SECTION. 01 OPERATOR PIC X VALUE IS “@“. 01 COUNTER PIC 999 VALUE IS 000. PROCEDURE DIVISION. DISPLAY “CALCULATOR TAPE” OPEN INPUT CALCULATOR-TAPE. READ CALCULATOR-TAPE AT END SET END-OF-CALC-FILE TO TRUE END-READ PERFORM UNTIL END-OF-CALC-FILE PERFORM GET-OPERATOR DISPLAY COUNTER SPACE “-“ CALCULATION-DAY “/“ CALCULATION-MONTH “/“ CALCULATION-YEAR DISPLAY SPACE F-OPERAND-A SPACE OPERATOR SPACE F-OPERAND-B SPACE “=“ SPACE F-ANSWER SPACE F-ERROR-MESSAGE READ CALCULATOR-TAPE AT END SET END-OF-CALC-FILE TO TRUE END-READ SET COUNTER UP BY 1 END-PERFORM CLOSE CALCULATOR-TAPE STOP RUN. GET-OPERATOR. IF F-OPERATOR=“A” THEN MOVE “+” TO OPERATOR. IF F-OPERATOR=“S” THEN MOVE “-“ TO OPERATOR. IF F-OPERATOR=“M” THEN MOVE “*” TO OPERATOR. IF F-OPERATOR=“D” THEN MOVE “/“ TO OPERATOR.`
In Conclusion …
And now, our COBOL calculator is finished. I guess all that’s left to do is go looking for a parallel-universe investor in need of a crappy calculator!
I want to reiterate that this is not (in any way, shape, or form) an extensive tutorial on COBOL, a perfect history of COBOL, or a guide to earning a job working with COBOL these days. The purpose of this was to find out more about this mythical language that every programmer seems to know about, but doesn’t really know.
It turns out, by learning about COBOL, you learn about some very basic concepts in computing, and how all these languages are really doing a lot of the same stuff underneath. I’m not sure if I’ll ever actually work in COBOL, but now, if I do, I think I’ll feel right at home.