Monday, July 11, 2016

Building software that builds software enforces quality

There's an interesting discussion over on the AMIA Implementation list server about software quality. As is often the case in many of my posts, it intersected with something I'm currently working on, which is a code generator to automate FHIR Mapping.  Basically I'm building software that builds software.

You find out a lot of things when you do that.  First of all, it is difficult to do well.  Secondly, it's very hard to create subtle bugs.  When you break something, you break it big time, and its usually quite obvious.  The most common result is that the produced software doesn't even compile.  The reason for this has to do with the volume of code produced.  A software generator often creates hundreds or thousands of times more software than went into it.  And even though it is difficult to do well, when you do it that way, you can produce 10, 20 or even 100 times the code that you would otherwise, with extremely high quality.

Software generators are like factories, only better, in this way.  If a component on the assembly line is bad (a nut, a bolt, et cetera), that results in a minor defect.  But ALL of the materials produced by a software generator with a very small exception are created by automation.  And a single failure anywhere in the production line halts assembly.  You get not one failure, but thousands.  The rare one-offs that you do get can almost always be attributed to bad inputs and a rare cosmic radiation event. Most of  the time we are dealing with electrons and copies of electrons; we never have just one bad bolt.  We have thousands of copies of that bad bolt.

Another interesting thing happens when you use code generators, especially if you are as anal retentive as I am about eliminating unnecessary code.  You will often find that the code you are generating is exactly the same with the exception of a few parameters which can often be determined at generation time.  When this happens, what you should do is base the class you are generating on a common base class, and move that code to the base class, with the template parameters that you specify.  This is a great way to reduce code bloat.  Rather than having fifteen methods that all do the same thing, the superclass method can do the common stuff, and the specialized stuff can be moved to specialization methods in the derived class.  Template parameters can help here as well.

In the example I'm working on, my individual resource providers all implement the basic getResourceById() method of a HAPI IResourceProvider in the same way, with specialized bits delegated to the derived class.  That's 35 lines of code that I don't have to duplicate across some 48 different FHIR resources.  I really only have to test it once to verify that it does what it is supposed to do in 48 different places.  If I was writing that same code 48 times, I guarantee that I'd do it differently at least once (if I didn't go crazy first).  No sane engineer would ever write the same code 48 times, so nobody using code generation should make the software do it that way either.

I once worked with a developer that generated a half million lines of code in this fashion.  In over two years of testing, implementation and deployment, his code generated a grand total of 5 defects.  For him, we could translate the traditional defects/KLOC metric to defects/MLOC, and he still still outperformed the next best developer by an order of magnitude.

That, my friends, is software engineering.



Post a Comment