Perhaps rather unexpectedly, on the 14th of March this year the GCC mailing list received an announcement regarding the release of the first ever COBOL front-end for the GCC compiler. For the uninitiated, COBOL saw its first release in 1959, making it with 63 years one of the oldest programming language that is still in regular use. The reason for its persistence is mostly due to its focus from the beginning as a transaction-oriented, domain specific language (DSL).
Its acronym stands for Common Business-Oriented Language, which clearly references the domain it targets. Even with the current COBOL 2014 standard, it is still essentially the same primarily transaction-oriented language, while adding support for structured, procedural and object-oriented programming styles. Deriving most of its core from Admiral Grace Hopper‘s FLOW-MATIC language, it allows for efficiently describing business logic as one would encounter at financial institutions or businesses, in clear English.
Unlike the older GnuCOBOL project – which translates COBOL to C – the new GCC-COBOL front-end project does away with that intermediate step, and directly compiles COBOL source code into binary code. All of which may raise the question of why an entire man-year was invested in this effort for a language which has been declared ‘dead’ for probably at least half its 63-year existence.
Does it make sense to learn or even use COBOL today? Do we need a new COBOL compiler?
Getting The Punch Line
To fully grasp where COBOL comes from, we have to travel back all the way to the 1950s. This was a time still many years before minicomputers like the PDP-8, never mind home computers like the Apple I and kin became a thing. In these days dinosaurs stalked the depths of universities and businesses, with increasingly transistorized mainframes and highly disparate system architectures.
Even within a single manufacturer series of mainframes such differences existed, for example IBM’s 700 and 7000 series. Since each mainframe had to be programmed for its intended purpose, usually scientific or commercial tasks, and this often meant that software for a business’ or university’s older mainframes would not run on the newer hardware without modifications or a rewrite, adding significantly to the cost.
Even before COBOL came onto the scene, this problem was recognized by people such as John W. Backus of BNF fame, who proposed the development of a practical alternative to assembly language to his superiors at IBM in late 1953. This resulted in the development of the FORTRAN scientific programming language, along with the LISP mathematical programming language, both targeting initially the IBM 704 scientific mainframe.
FORTRAN and other high-level programming languages offer two benefits over writing programs in the mainframe’s assembly language: portability and efficient development. The latter is primarily due to being able to use singular statements in the high-level language that translate to an optimized set of assembly instructions for the hardware, providing a modular system that allowed scientists and others to create their own programs as part of their research, studies or other applications rather than learn a specific mainframe’s architecture.
The portability feature of a high-level language also allowed for scientists to share FORTRAN programs with others, who could then run it on the mainframes at their institute, regardless of the mainframe’s system architecture and other hardware details. All it required was an available FORTRAN compiler.
Whereas FORTRAN and LISP focused on easing programming in the scientific domains, businesses had very different needs. Businesses operate on strict sets of rules, of procedures that must be followed to transform inputs like transactions and revenue flows into payrolls and quarterly statements, following rules set by the tax office and other official instances. Transforming those written business rules into something that worked exactly the same way on a mainframe was an important challenge. This is where Grace Hopper’s FLOW-MATIC language, formerly Business Language 0, or B-0, provided a solution that targeted the UNIVAC I, the world’s first dedicated business computer.
Hopper’s experiences indicated that the use of plain English words was very much preferred by businesses, rather than symbols and mathematical notation. Admiral Hopper’s role as a technical advisor to the CODASYL committee that created the first COBOL standard was a recognition of both FLOW-MATIC’s success and Hopper’s expertise on the subject. As she would later say in a 1980 interview, COBOL 60 is 95% FLOW-MATIC. The other 5% coming from competing languages – such as IBM’s COMTRAN language – which had similar ideas, but a very different implementation.
Interestingly, one characteristic of COBOL before the 2002 standard was its column-based coding style, that derives from the use of 80-column punch cards. This brings us to the many feature updates to the COBOL standard over the decades.
Standards Of Their Time
An interesting aspect of especially domain-specific languages is that they reflect the state of both said domain as well as that of the technology at that time. When COBOL was put into use in the 1960s, programming wasn’t done directly on the computer system, but usually with the code provided to the mainframe in the form of punch cards, or if you were lucky, magnetic tape. During the 1960s this meant that ‘running a program’ involved handing over a stack of punched cards or special coding form to the folk wrangling the mainframe, who would run the program for you and hand you back the results.
These intermediate steps meant additional complexity when developing new COBOL programs, and the column-based style was the only option with the COBOL-85 update as well. However, with the next standard update in 2002, a lot of changes were made, including the dropping of the column-based alignment, adopting free-form code. This update also added object-oriented programming and other features, including more data types to the previously somewhat limited string and numeric data representations.
What did remain unchanged was COBOL’s lack of code blocks. Instead COBOL source is divided into four divisions:
- Identification division
- Environment division
- Data division
- Procedure division
The identification division specifies the name and meta information about the program, in addition to class and interface specifications. The environment division specifies any program features that depend on the system running it, such as files and character sets. The data division is used to declare variables and parameters. The procedure division contains the program’s statements. Finally, each division is sub-divided into sections, each of which are made up out of paragraphs.
With the latest COBOL update of 2014, the floating point type format was changed to IEEE 754, to further improve its interoperability with data formats. Yet as Charles R. Martin pointed out in The Overflow in his solid COBOL introduction, the right comparison of COBOL would be to another domain-specific language like SQL (introduced 1974). One could add like PostScript, Fortran, or Lisp to that comparison as well.
While it’s technically possible to use SQL and PostScript for regular programming and emulate the DSL’s features in a generic (system) programming language, doing so is neither fast nor an efficient use of one’s time. All of which rather illustrates the raison d’être for these DSLs: to make programming within a specific domain as efficient and direct as possible.
This point is rather succinctly illustrated by IBM’s Program Language One (PL/I) – introduced in 1964 – which is a generic programming language that was intended to compete with everything, from FORTRAN to COBOL, but in the end failed to outperform any of those, with neither FORTRAN nor COBOL programmers convinced of the merits of PL/I to switch to it.
It’s important to realize that you don’t write operating systems and word processors in any of these DSLs. This lack of genericity both reduces their complexity, and is also why we should judge them solely on their merits as a DSL for their intended domain.
The Right Tool
An interesting aspect of COBOL was that the committee that produced it was not made up out of computer scientists, but rather by people within the business community, driven strongly by the needs of manufacturers like IBM, RCA, Sylvania, General Electric, Philco, and National Cash Register, for whom a good experience by the business owners and government agencies with whom they did business was paramount.
As a result, much like how SQL is shaped by the need to efficiently define database queries and related, so too was COBOL shaped over decades by the need to make business transactions and management work smoothly. Even today much of the world’s banking and stock trading is handled by mainframes running code written in COBOL, largely because of decades of refinement to the language to remove ambiguities and other issues that could lead to very costly bugs.
As attempts to port business applications written in COBOL have shown, the problem with moving statements from a DSL to a generic language is that the latter has none of the assumptions, protections and features that is the very reason why DSLs were made in the first place. The more generic a language is, the more unintended consequences of a statement may occur, which means that rather than the verbatim porting of a COBOL or FORTRAN (or SQL) statement, you also have to keep in mind all the checks, limitations and safeties of the original language and replicate those.
Ultimately, any attempt to port such code to a generic language will inevitably result in the DSL being replicated in the target language, albeit with a much higher likelihood of bugs for a variety of reasons. Which is to say that while a generic programming language can implement the same functionality as those DSLs, the real question is whether this is at all desirable. Particularly when the cost of downtime and mistakes tend to be measured in millions of dollars per second, as in a nation’s financial system.
The attractiveness of a DSL here is thus that it avoids many potential corner cases and issues by simply not implementing those features that would enable those issues.
Where GCC-COBOL Fits In
There’s currently still a severe lack of COBOL developers, even though demand is strong. Although GCC-COBOL is – like GnuCOBOL – not an officially validated compiler that’d be accepted anywhere near an IBM z/OS-running mainframe at a financial institute, it does however provide the invaluable role of enabling easy access to a COBOL toolchain. This then enables hobbyists and students to develop in COBOL, whether for fun or for a potential career.
A business could also use such an open-source toolchain for replacing legacy Java or similar payroll processing applications with COBOL, without having to invest in any proprietary toolchains and associated ecosystems. According to the developer behind GCC-COBOL in the mailing list announcement, this is one of the goals: to enable mainframe COBOL applications to run on Linux systems.
Although financial institutions are still highly likely to jump for an IBM Z system mainframe (the ‘Z’ stands for ‘Zero Downtime’) and associated bulletproof service contract, it feels good to see such an important DSL become more readily available to everyone, with no strings attached.