Checking return values

Yes, I had to post one more, hopefully shorter, rant on logger.c. This has to do with “checking return values” in OpenSource code.

#ifndef MSG_NOSIGNAL
# define MSG_NOSIGNAL 0
#endif

 if (sendmsg(ctl->fd, &message, MSG_NOSIGNAL) < 0) {
     logger_reopen(ctl);
     if (sendmsg(ctl->fd, &message, MSG_NOSIGNAL) < 0)
         warn(_("send message failed"));
 }

Code like this is rampant in OpenSource. It passes the quick and dirty teen age “code review” but it is not production quality. If Linux and the OpenSource community wants to gain respect it needs to start writing production quality code. This means real QA not automated tests run via Jenkins which test nothing.

The above code is NOT checking the return value in a production quality manner. It isn’t catching the error and reporting it via some human readable log entry. Ideally it should catch the error and report it along with the human readable text associated with the error.

Error: 12345 in blah - Severe indigestion

The hapless schmoe who has to debug this has to modify the code before they can even gain access to the error in the debugger. This code adheres to the letter of “checking the return value” without adhering to the spirit of it in a production quality manner. Snippets like this make it nigh on impossible to port OpenSource code to a regular production platform.

Scarier still are the publicly traded corporations running production systems on operating systems containing this type of code. Then they wonder how they got hacked without knowing it. Well, quite simply, because error return values weren’t checked in a production quality manner.

 

It’s okay to hate the 12 year old boys who write the bulk of OpenSource code

Even if their biological clock states they are north of 40, they never got past 12 when it comes to coding. In case you can’t see the featured image, the source file came from here. And since they will _hopefully_ sweep that up, here is the first snippet.

/* this creates a timestamp based on current time according to the
 * fine rules of RFC3164, most importantly it ensures in a portable
 * way that the month day is correctly written (with a SP instead
 * of a leading 0). The function uses a static buffer which is
 * overwritten on the next call (just like ctime() does).
 */
static char const *rfc3164_current_time(void)
{
	static char time[32];
	struct timeval tv;
	struct tm *tm;
	static char const * const monthnames[] = {
		"Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug",
		"Sep", "Oct", "Nov", "Dec"
	};

	logger_gettimeofday(&tv, NULL);
	tm = localtime(&tv.tv_sec);
	snprintf(time, sizeof(time),"%s %2d %2.2d:%2.2d:%2.2d",
		monthnames[tm->tm_mon], tm->tm_mday,
		tm->tm_hour, tm->tm_min, tm->tm_sec);
	return time;
}

The second problem is they are returning the address of a variable which is inside of the function. Those bounding {} of the function delineate the scope. The static qualifier means the function is only visible from within this source file. So, to try and defeat this, our little 12 year old stuck a static qualifier on the character array (string). ___Maybe___ in the latest and greatest C standard (which I haven’t read) the compiler is now required to BOTH save the value AND keep it externally accessible while the rest of the routine is being garbage collected away.

You see, the definition which was/is in most of the compilers out there, especially those which stopped updating at C99 or a bit sooner, that’s not a requirement. The “requirement” was that the variable only be initialized once and that it retain its value across multiple function calls. There was/is nothing about it still being externally accessible.

When you are writing OpenSource, you aren’t writing for the latest and greatest, you are writing for whatever happens to be out there.

While we are at it, choose “time” for a variable name, especially for a string, is Uber stupid. I couldn’t believe it compiled. My gut tells me that if this code really is working on PC platforms it is working due to a bad architecture and compiler with shit garbage collection.

Don’t worry, I’m not going to cover it all, this would be  a 2+ million word blog post if I did that.

#define NILVALUE "-"
static void syslog_rfc3164_header(struct logger_ctl *const ctl)
{
	char pid[30], *hostname;

	*pid = '\0';
	if (ctl->pid)
		snprintf(pid, sizeof(pid), "[%d]", ctl->pid);

	if ((hostname = logger_xgethostname())) {
		char *dot = strchr(hostname, '.');
		if (dot)
			*dot = '\0';
	} else
		hostname = xstrdup(NILVALUE);

	xasprintf(&ctl->hdr, "<%d>%.15s %s %.200s%s: ",
		 ctl->pri, rfc3164_current_time(), hostname, ctl->tag, pid);

	free(hostname);
}

You see, I started cross compiling this because adding all of the advertised features to the existing logger for OpenVMS was going to take several days to a week. Cross compiling this _should_ have been quicker. Yes, the current C compiler

$ cc/ver
HP C V7.1-015 on OpenVMS Alpha V8.3

Stopped around the time of the C99 standard. I won’t go so far as to say C99 is “all” there. I’ve banged into some partially implemented stuff with the networks, but haven’t dug in to see if they are part of the standard or not.

So the above snippet nests a call to our original snippet inside of a print routine. Once this got ported to OpenVMS I could step through the first snippet and see a perfectly formatted date string. When the sprintf() got done I had a big block-o-nothin where the date should be.

Old Timers, we know these things. Global data isn’t a bad thing, it exists because we need it. All of these little tricks trying to bob and weave around the Grim Reaper of garbage collection are just time bombs waiting to go off. So, a quick cut and paste moving that static char definition of time to global scope brought about the expected compilation error.

static char time[32]; /* place to store a time return value */
............^
%CC-W-MIXLINKAGE, In this declaration, "time" is declared with both internal and external linkage. The previous declaration is at l
ine number 378 in file SYS$COMMON:[SYSLIB]DECC$RTLDEF.TLB;1 (text module TIME).
at line number 112 in file RADIAN_ROOT:[2018-VMS-SYSLOG]logger.c;148

So, changing time[32] to time_str[32] and changing the few references to it brought about the desired outcome. What “could” have been the problem?

  • Grim Reaper garbage collection is faster on old DS-10 than modern Intel PC
  • DECC puts the value in a safe place when function goes out of scope and reassigns value when function is re-entered. The original storage location, however, is garbage collected.
  • Choosing “time” as a variable name, especially for a string, was an incredibly stupid thing to do and it collided with the OS level definition in a way which whacked the string value once outside of the function.
  • Compiler developers, having been told for years by unemployable academics without a single line of QA’ed code in production at any for-profit company that global data is bad, came up with a tweaking of the “static” definition to avoid making global what obviously needed to be global. This was done without concern for existing compilers or code in production.

You can probably come up with more.

The point is, the 12 year old boys always want to play with the new toys while the production coders will keep it simple. Production coders who go to independent QA departments know better than to push the envelop. Code they write today could be ported to a Z-80 chip with a 1980s compiler. You think I’m kidding? There are production systems today still running OS/2 Warp.

OS/2 Warp job posting

Oh, and for the record. I knew it would take 3 days to a week to add the bulk of the features to the existing ported code. This __should__ have taken one day to port. I’m now on day 3 tracking down stuff like this. Too close to finished to quit.

Thanks!

 

 

Question From a Reader

The following question came in from a reader who happened to catch one of my posts on a programming email list.

====

As usual, I quite enjoy your detailed analysis coupled with historical contexts, since I learned my trade through those days (Sun SPARC workstations, VAX minicomputers running VMS, etc.).  They are always entertaining, usually edifying, and sometimes nostalgic for me.  :)

I’m curious, though, about one particular point you made:

Windows isn’t even going to be Windows 2 years from now. It is going to be a Microsoft front end on top of what used to be Ubuntu Linux. They’ve already started the process with Windows 10.

Is this just prescience on your part, or is this based on some published road map from Microsoft?  I abhor Windows 10 along with the direction the operating system has taken, but if they are planning to truly run on a UN*X-based foundation, as Apple decided to do with OS X, then there might be some interesting times ahead that would keep me from jumping fully to Linux and sandboxing Windows into a VM.

====

Well dear reader,

Lots of little things published by Microsoft and Windows 10 itself __AND__ the fact OpenSource projects don’t get sued over data breaches.

Microsoft has publicly stated it is creating DOT-NOT Anywhere (don’t remember the exact name) as well as C# anywhere in an attempt to make their obsolete sh*t usable on the current desktop. Windows 10 is the first step (integration) at putting a Windows looking desktop on top of Linux.

You may recall Microsoft paid Novell lots of money to create the first draft of this many years ago. It was called Mono and it yielded one OpenSource product, Evolution. While Evolution was pretty good early on, it was soooooo tied to the Gnome desktop it never got a real following.

Keep in mind Windows started out as a task switching GUI on top of DOS which was criminally marketed and sold as “Windows Operating System.” You typed “win” at the C: prompt and when you exited Windows you were right back to the C: prompt.

While Windows NT, when it was on the Alpha, really was an actual OS because Cutler based it on an improved VMS, the tiny minds at Microsoft could not understand logikals, RMS, file versionning, passing via descriptor and the host of other improvements (I was actually doing a project at DEC when the Alpha was being built) so, the Microsoft weenies stripped it back to DOS. While they keep adamantly telling those too lazy to look that Windows 7, 8, 10 (what happened to 9?) “are completely different from DOS” few who ever worked at that level believed them because the bootstrap was pretty much the same. Different file names, but not really that different of code.

Somewhere on-line you can pull down a free copy of MSDOS if you feel like doing some hardware debugging. At some point many years ago MS released a version of DOS for download to kill the OpenDOS, FreeDOS and half a dozen other DOS projects which were gaining steam. Each project was having trouble with one of the Microsoft memory managers needed to run Windows 3.x cleanly. I forget their names. One got loaded in CONFIG.SYS and the other in AUTOEXEC.BAT. I just did a quick search and stumbled onto this discussion where everyone claims until they are blue in the face that such a release never happened, but I distinctly remember reading that very thing on more than one DOS project Web site. Pretty much when all forward effort stopped on most projects. FreeDOS suddenly seems to be quite active again though.

Ah yes, and there is this “MSDOS 7.1” from some group in China, but I digress.

For more than two decades now when you dropped to the command line you dropped to DOS. They kept telling everyone it was running in a VM and not letting you out to the base OS, but, dropping to a VM does not prove it is not still the base OS. Even when 98 and 2000 were released MS was telling the public they weren’t DOS booted, but now the “official” word seems to be:

MS-DOS 6.22 (1994, last standalone version)
MS-DOS 7.0   (1995, Windows 95A)
MS-DOS 7.10 (1996, Windows 95 OSR 2, Windows 95 OSR 2.5, Windows 98, and Windows 98 SE)
MS-DOS 8.0   (2000, Windows Me)

Ah yes, PowerShell has had some pretty massive security breaches as well.

Why? Because Microsoft never could create business class software they just did a __lot__ of marketing to dupe people into buying it and, more importantly, duped people into reporting far more sales than actually occurred.

Officials from Equifax are going to spend the next month, if not longer, being hauled in front of Congress. Beating up executives from corporations which cause massive consumer identity theft is about the only Bipartisan thing to occur in Washington. You may remember Bill Gates got this same pleasure when Janet Reno was screwing the human race not putting him in prison at the behest of the Clintons? Well, guess who is going to be behind yet another breach at some point? Some company will be using Microsoft Windows on a server which gets breached without a pre-existing patch from Microsoft and they too will get a return trip before Congress. If it is a wide enough breach they will also get to appear before the EU and get yet another round of sanctions along with the peepee whacking.

The ONLY way to sidestep this is to make the kernel and terminal all OpenSource Linux projects. “The Community” won’t be prosecuted because they are volunteer. This means all of the networking and other security are completely out of Microsoft’s hands. This also means that the beyond wretched “Windows Registry,” a source of countless attacks as well as system stability issues, goes away. Whatever Linux uses to keep track of things is what Windows will use now.

In order to make Windows more stable and secure, Microsoft has to abandon Windows.

So yes, to answer your question, within two years Windows will be just like OSX. A task switching GUI layered on top of an actual operating system. Windows 3.1 is back in vogue.

The getkey Trap

This topic came up on my Jed mailing list the other day and, once again, I got in touch with my inner Bill. Your inner Bill? Yes, that cranky old man who isn’t putting up with the world around him and wants it to be his way. The older you get the easier it is to find your “inner Bill.” In this case it fits because I have lived this death at least a hundred times over my career.

OpenSource allows pretty much any 8 year old with a keyboard to hurl code into a software repository without having read any books on programming logic or software design. Inevitably these 8 year olds tend to get involved with the fledgling projects “to leave their mark” but the truth is fledgling projects with only one or two contributors tend to never turn anyone away if the primary contributor doesn’t have an ego the size of Texas.

We went through this with the days of DOS. Back when we used to have magazines like “Programmer’s Journal,” “The C Gazette” and “The C User’s Journal” not to mention Bulletin Board Systems (BBS) where people shared code via both file uploads and echo mail it was the same story over and over. It seems like we had about as many languages being introduced back then as well. The difference is most were compiled languages, not scripting languages. Yes, real geeks wrote compilers for their languages _and_ made those compilers fit on a floppy.

You see, every beginner who wants to contribute something fundamental to a language or library ends up writing something called getkey(). The name will vary just a touch from language to language and library to library, but this will be a horribly simplistic routine. It will only return one octate or 8-bit byte. (Not all bytes on all processors have 8-bit bytes, some have 4-bit words.) Much of the industry quit using the term byte, instead using the term octate which was defined to be exactly 8-bits.

The student then “tests” their routine with the standard letters and numbers. They are overjoyed the thing works. Yeah, this is why every project needs a QA group and why TDD is a complete fraud. They don’t ever hit those other keys. You know, the function keys and those cursor navigation keys, let alone those special buttons for multi-media and such. Nope, their wonderful routine doesn’t get a key, it gets an octate and bad things happen when someone hits one of those other keys.

Oh, you can’t fix it. They have been returning a byte, or at best an int. Some of those keys have 3 and 4 octate sequences. There is no way to jamb them into the return value. Adding insult to injury this routine, by the time the flaw is “discovered and admitted” (OpenSource projects tend to just let bug reports rot rather than admitting there is a problem) too much existing code is out there expecting the limited return value. Some modules worked around the problem and others didn’t.

Usually the person involved in this debacle has already gone on to victimize other projects with the same debacle, continuing to spread their own special kind of joy in the Universe.

How OpenSource Bugs are -fixed-

Open Source Fix image
Open Source Fix

 

Open Source Fix 2 image
Open Source Fix 2

 

Open Source Fix 3 image
Open Source Fix 3

 

These 3 “solutions” are what typically “fix” all OpenSource bugs.

  1. Close without testing because “code has changed too much.”
  2. Declare it an “upstream” bug so you get credit for the close without actually doing anything.
  3. Let it rot until the version it was logged against is “no longer supported” and tell the user to retest against a currently supported code base.