Research Challenges in High Confidence Systems
HCS Problem Exploration - General Discussion
leftright
Overview
National Challenges for HCS
Outcome-Based Quantifiable Challenges
Issues
Endnotes


Overview

The HCS problem is one that should be seen with a renewed sense of urgency as needing increased government funding attention. The most significant change since the 1995 CIC report is the degree and rate of change in scale and integration in critical information-based systems. This leads to many concerns, the most important of which are:
  • The inability to define and predict interactively complex behaviors of highly integrated, large-scale systems that are life critical, safety critical, enterprise critical, or national security critical.
  • The growing exposure of critical systems, previously operated in a closed environment, to an open, widely interconnected environment that is much less predictable and increasingly hostile.
  • Increasing societal risks resulting from unanticipated and often unpredictable behaviors in large-scale, highly integrated, critical information-based systems. We can no longer accurately predict the potential consequences of misbehavior in such systems.
Pressures to achieve cost-saving or profit-enhancing system efficiencies through larger scales of integration are outrunning our engineering capabilities to design and assure predicable behaviors in complex systems. The tools and techniques in use today are primarily products of 1960s fundamental research that was focused on smaller-scale information technology components and federated approaches to system design, and are no longer adequate. The research results we purchased back then have long been used up; new fundamental results are needed.
 
This exploration of the HCS problem attempts to balance both the enabling-capability and the risk-oriented views. The danger of over-dramatizing the risk perspective is that such characterizations become largely ignored in the absence of catastrophic events that demonstrate such risks to society at large. The danger of overselling the enabling opportunities resulting from new system-level engineering approaches is that such approaches rarely achieve significant changes in the short term and often take much longer than expected. Regardless of the motivation, today's technical shortcomings are in engineering increasingly complex systems.
 
High confidence systems are those systems whose consequences of undesired operational behavior are great. Use of the term "high" implies degrees of confidence can be established. We want to accurately predict with a high degree of confidence how the systems will behave, rather than calculate from historical accident records how the systems did behave. What yardsticks do we have? These yardsticks may include degree of predictable behavior, freedom from surprise (the dual of predictability), tolerance of anomalous situations (bad inputs, bad states), and accumulation of evidence (both from the system development process and its operational behavior over time). Today, we largely depend on "good" engineering practices and sometimes collect data on those practices that work. However, process attributes do not directly inform one as to system operational properties. We need measures related to system requirements and design -- factors that requirements and designs must address (e.g., enforceable behaviors, measures of development defects).



National Challenges for HCS

Catastrophic events have not been sufficient to gain the national attention that is needed to foster high confidence research and development. For example, Intel's Pentium problem was costly and gained some attention, but addressing the problem remained relatively local to Intel and largely limited to better debugging that occurs late in the product life cycle. The catastrophic failure of the Arianne-5's first flight test was attributed to software, but received relatively little attention in the U.S.
 
We must articulate a set of challenges that will establish a national imperative for achieving high confidence in systems. The workshop participants discussed an approach that would:
  • State the desired outcomes in meeting the challenge.
  • Develop a way to measure the outcomes with associated milestones.
  • Identify the technical breakthroughs necessary to meet the challenge.
  • Identify the strategies and barriers to meeting the challenge.
  • Identify what industry will do on its own and what government must do for the public good.
In establishing this national imperative, we must keep in mind that we are seeing a growing dependence on computing throughout industries that have life-critical and safety-critical responsibilities. A new national sense of urgency must be brought to the issues of high confidence systems due to this increased use and rate of integration of computing and networking components in medicine, and in the transportation, financial, and energy industries. We want higher-confidence for existing applications while also being able to generate new capabilities such as pilotless aircraft and telesurgery.
 
Just as critical is the need to prevent the collapse of essential but saturated infrastructure systems such as air traffic control and an aging power grid designed for the early 1970s. A number of major systems that have been canceled -- the FAA's Advanced Automation System, the military's World-Wide Military Command & Control System (WWMCCS) Automation Modernization Program (WAM), the IRS Tax Modernization System, for example. Such cancellations are an indication of our inability to deal with the complexity of next-generation replacements for aging infrastructure systems (see, for example, the book Fatal Defect.14)



Outcome-Based Quantifiable Challenges

The workshop participants thought that HCS challenges should be stated in terms of desired outcomes and associated milestones. Examples of desired outcomes include higher transportation throughput with lower accident rates, or the Internet with higher numbers of transactions and lower risks.
 
A set of domain-specific national challenges could serve a purpose similar to that of the High Performance Computing national challenge problems and the goals established by NASA (see example 1, below). One such challenge for NIH could be to reduce the 28 percent medical error figure to 10 percent within five years. Another challenge for the Automated Highways Project could be to increase safety (by some factor) by getting the driver out of the loop while simultaneously increasing the traffic throughput (by another factor) on the nation's highways. Other challenges could include reducing fraud in financial systems by some factor; or increasing the ability to do electronic commerce over public networks by some factor. There is a spectrum of domains to address, from medicine to transportation, and from safety-critical to general quality-of-life, with ubiquitous computerized appliances. It is not necessary that we know how to achieve solutions to these challenges now. Rather, the challenges should help identify the technical breakthroughs that are necessary to achieve them. We will need a way to measure the outcomes.
 
The outcome-based articulation gives a picture of the world that is significantly different from that of today (e.g., pilotless commercial aviation). We need challenges that let us envision how to change the world. Like the HPC grand challenges, this articulation should appeal to the general public who will pay for this research. We must be wary of articulating challenges that generate fear (e.g., a pilotless airplane is a scary idea, as are self-driving cars).
 
We should also stress affordability challenges. Methods to achieve high confidence systems can potentially lead to less costly technology while improving their operational efficiency. For example, in the 1996 Information Technology Management Reform Act (ITMRA), such goals include a Sense of Congress provision that "Executive Agencies should achieve a 5% per year decrease in the cost incurred by the agency for operating and maintaining information technology, and a 5% per year increase in the efficiency of the agency operations."
 
We need to establish strategies for each articulated challenge. What are the barriers to achieving that challenge? What research is required? We will probably want a combination of focused research with relatively near-term results and longer-term, higher risk research with potential for high payoff.
 
Identifying nearer-term technical milestones makes progress on these challenges measurable. One example that participants discussed was to reduce software defects by a factor of n in x years -- in particular, requirements defects by a factor of two in five years; specifications defects by a factor of five in ten years; and similarly for design, coding, and testing. Of course, this is not a simple set of objectives. Each one leads to many additional issues. For example, in reducing requirements defects by 50 percent:
  • How should one count errors of omission?
  • How are such errors detected?
  • Can such errors be masked?
  • How is completeness assessed?
  • Are High Confidence relevant requirements uniquely identifiable and separable from other system requirements?
Example 1: NASA's Aeronautics Enterprise.  One successful example of a challenge-setting undertaking is NASA's Aeronautics Enterprise.16 NASA's strategy is to exploit digital technology to increase aviation safety. NASA is developing an outcome-based research agenda across multiple industries and agencies (e.g., FAA and DoD), identifying industry partners to absorb the developed technology. The research agenda is intended to contribute to ten long-term, outcome-oriented challenges that NASA has established for the next twenty years in aeronautics. These are national challenges with a relevance that can capture the attention of the public, including:
  • Reduce the cost of air travel by 25 percent in ten years and 50 percent in twenty years
  • Reduce the aircraft accident rate by 80 percent in ten years and 90 percent in twenty years
All ten NASA challenges are designed to "stretch" technology. For some of the challenges, NASA does not know how they will actually be solved. But the point is to create a challenge that will pull the research along. For example, meeting some of the challenges involving dramatic reduction in coast-to-coast air travel times may require eliminating the pilot. This means that there must be a change in the way flight crews interact with the aircraft. This leads to new behavior requirements for computing and networking associated with commercial aviation.
 
Example 2: The NGI Initiative.  The President's Next Generation Internet (NGI) initiative is intended to lead into "terabit-per-second network speeds over wide area advanced capability networks." There are three goals for the NGI: (1) Experimental Research for Advanced Network Technologies, (2) Next Generation Network Testbed, and (3) Revolutionary Applications. In the NGI implementation plan (IP) many performance metrics are defined within each of the three NGI Goals. Examples of these include:
  • Demonstrate 25 percent utilization improvement in the 100 node, 100+ Mbps demonstration network (Goal 2.1) over 3 months and 100 percent utilization improvement in the 10 node, 1+ Gbps demonstration network (Goal 2.2) over three months.
  • Demonstrate 15-msec response capability.
The above are not directly tied to applications. To plan the development of NGI applications, the NGI IP identified key "affinity groups." The NGI IP placed "disciplinary affinity groups" (e.g., health care, crisis management, basic science) in the rows of a matrix and within them looked for enabling "applications technology affinity groups" common across those rows (e.g., collaboration technologies, distributed computing, and privacy and security) that would support those disciplinary groups. It is through these affinity groups that additional applications measures will be developed. The disciplinary affinity groups would identify applications within the scope of that group that require NGI bandwidth and services; especially those applications that, while impossible without NGI, would improve mission success using NGI technologies. This leads to the following questions for HCS:

  • What are the most appropriate ways to define and measure HCS contributions (technology specific, application specific)?
  • How can we map HCS success down to more technical dimensions that can be measured? For example, could we have the top of a matrix have security, safety, reliability, and functional correctness?
  • What are the significant HCS performance measures (challenges) we can demonstrate or target? Can we show a reduction in the cost of producing high confidence software by 50 percent while increasing the safety and/or security by some factor? Could we develop product-based criteria that would reduce the cost per line of code or per delivered function for field-certified systems by half without any increase in safety related incidents?



Issues

There are many issues to address in meeting the challenge problems for high confidence systems. One set of issues deals with process, including tools, while another deals with the product. These two sets are tightly interrelated. A third set deals with measurement. Table 1 lists the issues identified by the participants.
 
Table 1.  HCS Issues
 
 
I. Process Issues
  • Getting the system requirements right

  • System design. Systems engineering viewpoint: commonsense engineering principles for safety, for dependability, for security. Learning from past mistakes and incorporating this knowledge into reusable design principles.
    • Integration of different properties -- the "ilities."
    • Understanding core principles and practices common across the different properties, and understanding the tradeoffs among properties.
    • Composition: dealing with component technologies. If we are mixing and matching from a set of reusable components, how to get confidence in the parts, and how do we get confidence in the composition?
    • Evolution. How to maintain high confidence in extensible systems, user-modifiable systems, customizable systems, dynamically adaptable systems
    • Usability - treating the user as part of the system to reduce interface errors

  • User-centered design.
    • Coding standards and design practices

  • Tools:
    • Tools for considering the entire system
    • Tools to reduce/understand complexity.
    • Informal methods and lightweight tools.
    • Tools that support different problem solving strategies.
    • Tools tailorable to specific domain applications.
      • seek a general purpose underpinning (e.g., decision procedures) common to all.
      • package in a way usable across multiple domains. Different domain-specfic front ends.
      • Fundamental research in term rewriting, decision procedures. Need breakthrough research to address current brick walls, e.g., verification of programs that use integer arithmetic. Need decision procedures that are fast and efficient available in reusable form.
    • Testing tools and techniques
      • Specification-based testing. Develop test cases directly from specifications automatically. Find ways of replacing some of the current extremely expensive testing, structural tests. Reducing cost of testing. Alternatives to hardware testing (because of the high cost of unit test.)
    • Modeling tools and techniques

  • Reducing the time and cost of certification. Incremental evaluation cost for incremental change.

II. Product Issues:
  • Lack of a conceptual framework, models.
    • Establishing the boundaries of a system

  • No unintended side effects.

  • Commonalities may reduce certification cost

  • Improvements to new designs; establish well-known "safe" designs or architectures.

  • New designs for high confidence within the space of new system designs.

  • Introducing confidence into legacy systems.

III. Measurement Issues:
  • Assessing the effectiveness of processes and practices.

  • Metrics for degree of confidence in system.

  • National standards for certifying tools.



Endnotes
  1. Ivar Peterson, Fatal Defect: Chasing Killer Computer Bugs.

  2. See the NASA Aeronautics Enterprise Web site for more details: http://www.hq.nasa.gov/office/pao/NASA/aeronautics.html.

leftright