You are here

Well It Was Twenty Years Ago Today...

It was early June in 1987 when Richard Stallman announced the release of the GNU C compiler version 1.0. As I wrote in Open Sources, it was the most thrilling and most terrifying day of my life (up to that point). Having first read and lightly hacked Emacs code in 1985, having read and lightly hacked GDB code in 1986, I eagerly attended a week-long lecture series on Emacs Stallman gave in Febrary 1987 at MCC in Austin Texas. By day, my appreciation for Stallman and the GNU project grew as I as I began to understand the depth of his technical designs, the beauty of his implementation approaches and the method by which both were protected for me instead of from me by the GPL. By night, my curiosity grew as Stallman would steal moments between lectures and especially at the end of the day to hack on a project he was not yet ready to disclose.

By the end of the week in February the secret had been revealed to me: he was working on a C compiler and was in the final stages of polishing the optimizer. I was 22 years old at the time, and I still harbored a dream that I might someday write "The Great American Compiler". I was thrilled to have a chance to see how a master like Stallman would write such a thing, and I was terrified that I would never create something worthy of recognition if he had been there first. When Stallman announced version 1.0, 20 years ago to this day (more or less), I had a decision to make: I could join him, I could compete with him, or I could pick a new dream. I downloaded GCC version 1.0 and began a collaboration that would last for ten years (when, due to RSI, I gave up programming as a mainstream activity.)

GCC version 1.0 was a watershed event in my life. The first day I was intent on just learning how it worked--the parser, the lexical analyzer, the intermediate code generator, the optimizers, and the machine-dependent assembly output routines. I printed all the files (consuming three reams of paper) and proceeded to lay them out in piles across my extra-large dining table and any other horizontal surfaces I could see from my central position. I brought out the colored pencils I'd used when learning the art and science of integrated circuit design (red for polycilicon, green for n-diffusion, blue for metal, yellow for n-well, and black for vias and contacts) and gave the colors new meanings to track data, code, and other relationships relevant to compilers. After five days and nights, something "clicked" and I no longer needed the printouts--the data structures and basic organization suddenly became "obvious". I put away the pencils and began to itch with the desire to hack GCC, and not lightly.

I decided to take the next step--to write a completely new port of the compiler to a microprocessor that had always fascinated me: the National Semiconductor 32032. I already had the manual from my days at the Moore School of Engineering at the University of Pennsylvania, and so I began the process of replacing templates used for the VAX and Motorola 68020 processors with templates appropriate for the National chip. I immediately ran into problems--but they were small problems and easily fixed. I had GCC generating code for the National chip within a few days, and I had it generating correct code a few days after that.

When I did this work, the typical compiler company charged millions of dollars for the service of delivering a compiler in 12-18 months, and sometimes longer. Two weeks to the day after I downloaded the compiler from the Free Software Foundation, I had it generating code that was 20% faster that the code coming from the National compiler. If I compared the $3000/month salary I earned at the time with what these companies charged, I delivered a product that was 20% better (6-9 months in Moore's Law time) for 1/1000th the cost in 1/30th the time. It was a heck of a 23rd birthday present to myself!

Within hours of posting an announcement of this new port, it became obvious to me and to others that there were many optimizations I had not yet implemented. After another two weeks of work, I was generated code that was 40% faster than the National compiler. 40% was an important number because the 32032 was marketed as a "1 MIPS" chip (meaning able to execute one million instructions per second), but it only benchmarked at 0.75 MIPS and was headed for commercial irrelevancy. With my 32032 port, the chip benchmarked above 1 MIPS, proving to me that the hardware guys had delivered, but the software guys had not. I also realized that even in 1987, there were already sufficiently many people to make a collaborative process very successful.

I went on to write many other ports that summer, demonstrating with remarkable consistency that by hacking on GNU, I could routinely generate better product faster than whole teams of compiler-writers employed by much larger companies. By the end of that year I went on to achieve a new goal, which was to write the GNU C++ compiler. To me, GNU C++ was "The Great American C++ Compiler", a new, improved version of my original goal. And because I benefited so much, both by what others shared with me before I started hacking, and by the helpful suggestions and bugfixes I received after I started hacking, I, too, shared freely the knowledge I had gained. I wrote and delivered the first comprehensive training on the GNU compiler internals, giving my 100+ LaTeX slides to Stallman so he, too, could earn money by teaching others how to hack. Two years later, I formally commercialized the commercial advantages of GNU software, and the rest, as they say, is history.

I wonder what the next 20 years will bring...

Comments

Yep, its been nearly 20 years for some of us. My first successful hack on the emacs core was to add unexec support for the Convex machines. (Convex (in Dallas!) was my employer at the time.) Then I got busy on GCC, first on the SPU and IOP for the Convex machines, then on to the actual CPU itself. Early benchmarks showed it was as fast, and occasionally faster than the "big science" vector compiler when used on mostly scalar code. A situation that didn't make the manager of the compiler group happy. His compiler was "for sale" and employed a group of 12 or so people. GCC was faster on code that didn't vectorize well, and, frankly didn't have much of an instruction scheduler at the time. Somewhere in there I gave back the diffs for gdb on the Convex machines as well. This made the compiler group manager even less happy, since he also had a two person group working on an 'advanced' debugger for the Convex machines that was far worse than GDB (and the X Window System interface to gdb just blew the doors off that project.) Subsequent to my leaving Convex, politics prevailed, and it became a fireable offense for the guys in "Technical Marketing" to use GCC in a customer benchmark. More here, if you care.