Posted inInformation Technology

Trigraphs and Digraphs

I remember thinking trigraphs were a cheap hack when they were first introduced. Now that I’m old I get called on to help “modernize” legacy code. Keyboard images courtesy of Admiral Shark’s Keyboards.

Too much of the information posted on-line about Trigraphs and Digraphs is incomplete or just plain wrong. Those who have only ever worked on x86 hobby computer systems blame the 7-bit ASCII-ish ISO 646 character set not having some of the critical characters for C programming. While that is technically a true statement it is only a partial truth. During the sub 300 BAUD modem days, most non-IBM communications used 7-bit ASCII so the 8th bit of the byte could be the parity bit. At 150 BAUD, every bit you didn’t have to transfer was a speed improvement.

Not every computer used/supported 8-bit bytes. There were 4-bit computers out there and may still be some existing in remote controls and other embedded systems. The PDP 11/70 was a word based computer as were many mini-computers of the day. When you incremented an address in the Assembly language it moved 16-bits.

Big Dog

The main reason Trigraphs and Digraphs exist is the fact IBM was the big dog. The EBCDIC character set had no representations for {}or []. When PASCAL was introduced we had to use (* for the { and *) for the } or maybe it was for the [ and ], I forget.

EBCDIC deliberately had holes in it. This was for security. It allowed companies to use the Assembly language Translate Under Mask instruction mapping readable characters into those holes however they saw fit. Only an application with an exact opposite mask could decrypt it. Social Security Numbers, bank accounts, dates of birth, could all be obfuscated this way in what would otherwise be a raw text file.


This version of the EBCDIC table is based on the presentation in Appendix C of System 360 Programming by Alex Thomas, 1977, Reinhart Press, San Francisco.

The security of the character set holes is rather obvious when you see the table laid out in a grid like this. You could move those printable characters all over. Only someone with the inverse of your mask could decrypt your text. If someone stole your data tape during shipping, they couldn’t read it. Some places used a different mask each day of the week.

A more recent version of EBCDIC

While the tables aren’t laid out exactly the same, it is obvious EBCDIC has changed in not-backwardly-compatible ways. This matters when it comes to off-site backups where tapes have data encrypted roughly 30 years ago (like the origination files for a 30 year mortgage.)

Porting

C/C++ program written on non-IBM platforms could not be ported to IBM due to the fact EBCDIC didn’t have {} or []. You may not think this a problem, but all commercial relational databases being developed in the 1980s through mid 1990s were developed using C and later C++. Think Oracle, Sybase, Ingress, etc. as well as supporting 4GL languages to access relational and legacy data. Every one of those companies wanted to get their code running on IBM because that’s where the money ways. DEC was where the rest of the money was and it’s character set was fine. HP and the also-ran brands basically just got tip money from IT departments. Your commercial product had to work on both DEC and IBM platforms to be a financial success.

Nobody wanted to maintain two sets of source. Even fewer people wanted to look at the EBCDIC code. A bigger insult was that white space now mattered in C.

(*ptr != 0*)

got translated to

{ptr != 0}

if it successfully got translated. It was a compilation error either way.

Enter Trigraphs

Trigraphs were most definitely a hack. The biggest checkbook in the room spent enough money to get enough seats on the committee to get them in. At the time they were defensible.

The NCR PC-4 with dual floppy and 640K of RAM with its ghosting green monitor was a kick-ass home computer.

AST Premium 286

The AST Premium 286 was just coming out but still had the same DOS 640K limitations. In short, the x86 home hobby computers didn’t have the hardware to be accommodating. Unicode wasn’t even a gleam in the eye.

The Compromise

C/C++ and even Pascal programs written on other platforms could not be ported to IBM. Pascal programming language was published in the early 1970s around the time of ALGOL’s complete failure. The C programming language was created in the 1970s but didn’t really escape into the wild until 1978 when The C Programming Language book was published.

Trigraphs were introduced in the 1989 ANSI C standard (which essentially became the 1990 ISO C standard). Digraphs were introduced by the 1995 amendment to the 1990 ISO C standard. IBM fought the removal of trigraphs from the C++17 standard.

Of course Digraphs are butting heads with emojis. I’ve never seen them used in actual production source code.

Never Use The Edges

As a traveling consultant I tell every green programmer to avoid the edges and dark corners of a language specification. Most of them can’t wait to prove their geek-i-ness with obscure things or just plain wrong lessons taught by “leet” coding sites.

During the 1980s and early 1990s geeks tried to show how cute they were by writing code to swap two numbers without using a temporary. Most of them relied on holes in the compiler. Even the “best” of those tricks only worked for a limited range of values. You can bet their companies went out of business.

The C/C++ language specifications are mostly heritage now. It isn’t sexy. You have to actually know something to program in C/C++. If you are self-taught there is a 98+% change you are doing things wrong. Not saying that to insult you. Just pointing out those classes everyone hated to take in college are really important.

As a legacy/heritage language many “great ideas” of the past are coming out of it now. Trigraphs are definitely one of them. I don’t even know if compilers support the K&R parameter style anymore.

There was a time when it ruled the land. If you want a list of what died with C++17 read this. Yes, so-called “leet” coders used to use ++ on a bool to change its value. Thankfully that is dead now.

Summary

Trigraphs and Digraphs were a bandage to get us over a rough patch. Major equipment manufacturers went out of their way to be incompatible with each other so they could sell more over priced stuff. If you didn’t live through it you can’t understand it. Most likely you never had to pay $1200 for a branded network card that would only work on this vendor’s network.

Unlike many of the other things which died with C++17, Trigraphs will most likely still exist in IBM compilers for many years to come. If there is enough legacy code still in production which cannot be ported forward, the big checkbook of IBM will get enough people on the committee to bring them back.

In the meantime, my email will back up with porting projects.

Roland Hughes started his IT career in the early 1980s. He quickly became a consultant and president of Logikal Solutions, a software consulting firm specializing in OpenVMS application and C++/Qt touchscreen/embedded Linux development. Early in his career he became involved in what is now called cross platform development. Given the dearth of useful books on the subject he ventured into the world of professional author in 1995 writing the first of the "Zinc It!" book series for John Gordon Burke Publisher, Inc.

A decade later he released a massive (nearly 800 pages) tome "The Minimum You Need to Know to Be an OpenVMS Application Developer" which tried to encapsulate the essential skills gained over what was nearly a 20 year career at that point. From there "The Minimum You Need to Know" book series was born.

Three years later he wrote his first novel "Infinite Exposure" which got much notice from people involved in the banking and financial security worlds. Some of the attacks predicted in that book have since come to pass. While it was not originally intended to be a trilogy, it became the first book of "The Earth That Was" trilogy:
Infinite Exposure
Lesedi - The Greatest Lie Ever Told
John Smith - Last Known Survivor of the Microsoft Wars

When he is not consulting Roland Hughes posts about technology and sometimes politics on his blog. He also has regularly scheduled Sunday posts appearing on the Interesting Authors blog.