Research Challenges in High Confidence Systems
HCS Solution Exploration - Discussion
leftright
Establish the Compelling HCS Needs
Building a Better Base
Clearer Articulation of Needs and Opportunity Potential
Endnotes


Establish the Compelling HCS Needs

To obtain new funding for HCS research, we must establish that the HCS needs are compelling and that clear achievable goals exist. We must convey the urgency generated by the trend toward ubiquitous computing, the increasing level and accelerating rate of integration, and the resulting increased personal risk exposures for every citizen. More important will be addressing such risks in transportation, health care, and critical national infrastructures.
 
The High Performance Computing and Communications (HPCC) Program argued that research investments lead to economic advantage. This approach should be considered here as well. The goal is to be able to build good systems well. Building good systems means building in properties such as reliability, and building them well means achieving the result with the most effective and efficient tools. Thus, in addition to high quality, we should articulate the added benefits of improved productivity. The result is better and more affordable systems. Establishing this HCS research agenda today will prepare us for the day when industry turns to computer scientists for help with large-scale, highly complex projects, especially for those that are life critical.
 
Without such a research program, we could be blindsided by another country that invests in this area to build a capability for higher quality software. The U.S. must develop and maintain a reputation for high quality systems to maintain the health of the our computer industry. Such a research program can also enable the "rocket science" -- e.g., the equivalent of putting people on the moon -- that our Government and industry needs. Finally, we want to ensure that life-critical and other safety-critical systems are built with the requisite degree of confidence.



Building a Better Base

Efficiency in the use of scarce research dollars points to the need to integrate or interoperate our tools for high confidence, for example, libraries of components of mathematical software. This will not be easy. In the experience of one researcher, two years of research in logic were needed to figure out what the common theory was to put two systems together. Different theorem provers are based on different logic. Each theorem prover has its own procedures, separately coded and separately verified. Integrating these systems will require these provers to talk to each other in a sound way -- it is not simply a syntax problem. We need common sets of tactics and tactics -- style theorem provers -- essentially a theorem prover's workbench. We should be able to share these components, and research is required on how to do this well. To start with, we need to share decision procedures, mathematical software libraries, models, and provers; and it should be possible to connect them. Such sharing will result in monetary savings (i.e., it will provide the leverage to achieve savings), but it will also require near-term increases in costs to build the necessary infrastructure to enable sharing. We could follow the model of the National HPCC Software Exchange that has been funded by NASA and other CIC agencies at more than $1 million per year. This exchange does not currently support research, however.
 
Sound results require a solid science base. Many people take for granted the tools that were built up over years of hard work (e.g., higher order logic, constructive theorem provers). But there has not been nearly enough investment to support today's needs. Meeting our goals also demands significant improvements in systems engineering. Efficient tools and processes are needed for increased understandability of complex systems. Improved integration of formal methods as well as lighter-weight methods into the systems engineering process and improved efficiency of these tools are needed. We need tools that can be widely used on the World Wide Web. For example, we need to integrate with tools such as Java. We should exploit Web mechanisms to make provers available and to provide lighter-weight tools.
 
Achieving these goals will require stronger partnerships between the researcher and the engineers who build systems. We must find out what they want and need. The experience of at least one academic research organization has shown that in working closer with industry through partnering, it discovered things that developers really want that the researcher had not previously realized.



Clearer Articulation of Needs and Opportunity Potential

Why should we begin a new research initiative now? Systems are getting larger, more complex, interactive, integrated, and they fail from the sheer complexity of their own interactions. "Normal accidents" resulting from interactive complexity will become more prevalent in the future. We need to instill a sense of urgency in industry, government, and the general public if we are to head off the potential for increasingly catastrophic results of such accidents.
 
We are reaching the limit of what we know how to do safely. System design considerations used to be separable, but we can no longer build systems one piece at a time and do local analysis. It is increasingly difficult to predict overall behavior. Performance requirements mandate more integrated systems. An example of such integration and its resultant problems is the DARKSTAR stealth product at Boeing that is designed without a tail (rudder) surface. It was not possible to design flight controls along one axis at a time, because everything interacts with everything else. Engineers concentrated hard on the yaw axis, but much less on the pitch axis; as a result, the DARKSTAR crashed on takeoff on its second flight.
 
Another reason for doing this now is that today's higher-powered computing infrastructure -- as well as significant successes in applying formal techniques, most notably model checking -- allow us to do much more with our tools. For example, through model checking, hundreds of bugs were found in the Intel P7 processor. Engineering organizations are beginning to see the value of formal methods. Intel has made a commitment to verify all of its P8 processors and is already absorbing much of the world's supply of formal methods graduates (about 60 per year).
 
The U.S. is falling behind in the area of high confidence systems, both in research and in industrial products. We need cheaper building blocks for high confidence systems that can be inexpensive and commercially viable. The SAFEBUS of the Boeing-777 costs about $10 million. The Europeans intend to develop products that cost from $10 to as low as 50 cents per item. For example, there are new HCS opportunities that have been largely ignored in the U.S. such as Time-Triggered Architectures (TTAs). TTAs provide a much simpler and cheaper means to achieving control than current expensive control system products. They are well designed, very simple versions of more expensive products, and they have been thoroughly analyzed. Because of their low cost and higher reliability, they can out-compete most of the U.S. control system suppliers. Driving down costs will allow us to move into new application areas.
 
In summary, a new national research initiative in High Confidence Systems should be started. It should be driven by several high-value societal challenges. It is urgent that the initiative be started now, because technology is becoming increasingly dependent on highly complex systems for our safety and national well-being. Moreover, the U.S. is falling behind other countries in technology investments in this area, which could contribute to an eventual decline in U.S. leadership in the computing industry.



Endnotes
  1. Charles Perrow, Normal Accidents, 1984: A "normal accident" is one in which interactive complexity and tight coupling of system characteristics will inevitably produce an accident. "It is meant to signal that, given the system characteristics, multiple and unexpected interactions of failures are inevitable. This is an expression of an integral characteristic of the system, not a statement of frequency."

  2. Herman Kopetz, "Real-Time Systems, Design Principles for Distributed Embedded Applications," Chapter 8, The Time-Triggered Protocols. Kluwer Academic Publishers, 1997.

leftright