Posted inPublishing

Automatic Concordance File Creation

Wanna be technical writers search in vain for this pink Unicorn. You see, the ugliest part about writing a technical book is creation of the index. Normally this involves a lengthy reading process where you select each word with the mouse and do a bunch of manual clicking and keying to create some hidden codes in the document file. These hidden codes will then be processed when the users chooses to “create index” and, in theory, a perfectly formatted index will appear at that point in the document.

Let me be the first to tell you that your first three or four cuts at the index will be trash. If you are not a professional and do not keep a pile of scrap paper showing what level and under what word you want each item to appear, your index will have things luggied up pretty bad.

A concordance file is a magic thing. Each word processor has their own format, but they all use it the same way and most provide an editor to create the entries. All you have to do is get your list of words together and key them in. Sounds simple, right? Well, if the OpenSource and lower end word processors would bother to include a “unique word list” function, it would be. If you happen to be on a platform where TEA is available, then you have it made. This is one of the many tasks TEA was designed to help with.

http://tea-editor.sourceforge.net/

For an editor developed by a journalist without any formal C++ training, TEA rocks. It has a bit of a quirky interface, but, if you remember the old IBM DOS editors, it’s rather nostalgic. Simply save your document as a TXT file, then open it with TEA

Functions->Analyze->Extract words

You should end up in a new buffer which has one word per line and probably a lot of blank lines. Use <CTRL><A> or the menu to select all.

Functions->Sort->case insensitively

Use <CTRL><A> or the menu to select all.

Functions->Filter->Remove duplicates

Now, manually delete the words you do not want in your index and save your new TXT file.

You can now use the concordance editor provided with your word processor to create one entry for each of these words. You will have to fudge a bit to create multiword index entries, but the bulk of it is done for you. A little work with the mouse and you will be all set. When you generate your index this time it should really be the way you want.

Roland Hughes started his IT career in the early 1980s. He quickly became a consultant and president of Logikal Solutions, a software consulting firm specializing in OpenVMS application and C++/Qt touchscreen/embedded Linux development. Early in his career he became involved in what is now called cross platform development. Given the dearth of useful books on the subject he ventured into the world of professional author in 1995 writing the first of the "Zinc It!" book series for John Gordon Burke Publisher, Inc.

A decade later he released a massive (nearly 800 pages) tome "The Minimum You Need to Know to Be an OpenVMS Application Developer" which tried to encapsulate the essential skills gained over what was nearly a 20 year career at that point. From there "The Minimum You Need to Know" book series was born.

Three years later he wrote his first novel "Infinite Exposure" which got much notice from people involved in the banking and financial security worlds. Some of the attacks predicted in that book have since come to pass. While it was not originally intended to be a trilogy, it became the first book of "The Earth That Was" trilogy:
Infinite Exposure
Lesedi - The Greatest Lie Ever Told
John Smith - Last Known Survivor of the Microsoft Wars

When he is not consulting Roland Hughes posts about technology and sometimes politics on his blog. He also has regularly scheduled Sunday posts appearing on the Interesting Authors blog.