Len Holgate asks "What's your worst bug?" The following story didn't involve one of my own bugs, but was related to me by John Steel, a colleague of my step-father. I liked the story so much that I included it in my recent .NET debugging book in the "light relief" section at the end of one of the chapters.
"Many years ago I was a development engineer in the aircraft industry working for a company specializing in aircraft electrical components and systems. We had the contract for the entire electrical system for one of the UK's V-bombers. The prototype of one of these aircraft crashed following some high speed high altitude ( 20,000 metre - yes metres!) trials.
"One of the devices I helped develop was an actuator-reversing switch which among other things controlled the pitch of the horizontal tail surfaces. The entire surface moved, not just the elevators, and when the aircraft remains were recovered it was found that the tail plane was locked in the high-speed high-altitude position. Furthermore, the reversing switch had locked out which, it was designed to do in the event of the contacts welding together. Indeed one set of contacts was welded and the other contact pair had interrupted the current. This prevented any fire, but left the aircraft impossible to handle at low altitude.
"The switch unit had passed all of its development tests including extensive tests at simulated high altitude (low pressure and low temperature), and the type had its Air Registration Certificate. The first question was why had it locked out, an event that could, from the tests, only happen if the contacts were opened slowly leading to severe arcing. So the next question was why had the contacts opened slowly?
"This is when the unpredictable occurred. In carrying out the development trials, I had insisted that the bearings in the mechanism were lubricated with a special low-temperature constant (with temperature) viscosity oil called "UNIVIS P38". To ensure that the viscous friction was reduced to the absolute minimum, I had the bearing, which consisted of a stainless steel shaft running in a sintered bronze tube, modified so that the bearing clearance was increased from 10 microns to 60 microns. The shaft was also modified to look like a bobbin with bearing surfaces at each end rather than continuous.
"The net effect of these modifications was to reduce the friction torque by a factor of between 500 and 1000, enough to ensure snappy action even at the highest altitudes. The downside was that the mechanism was slightly 'sloppy', but again extensive tests demonstrated that there was no deterioration in wear rate due to vibration or repeated operation. The trial units lasted in excess of 10,000 operations, equivalent to substantially more than the aircraft lifetime.
"So what went wrong? In transferring the design, tooling and so on to the production team, one of the production engineers, unbeknownst to the development team, had decided that the tolerances and clearances needed to be tightened up! He didn't want to be responsible for sloppy mechanisms leaving the factory. So, ignoring the warning note on the part and the assembly drawings that the clearance of 60 microns was correct, he modified the shafts to reduce the clearance back to 10 microns, without consulting the development team and ignoring the large letters on all the company's drawings that insisted, in large letters:
IF IN DOUBT ASK
"His actions caused the total loss of one prototype machine costing in today's money £10n where N is around 9. But luckily the crew were saved by their Martin-Baker ejection seats.
"The lesson to be learnt from this case and thousands of others like it is that individual engineers have a duty to pay great attention to detail and to communicate continuously about what they are doing and explain their decisions and actions. Even then the only certainty is that the unforeseen will happen."