 |
 |
|
|
|
Overview
|
The HCS problem is one that should be seen with a renewed sense of urgency as needing increased government funding attention. The most significant change since the 1995 CIC report is the degree and rate of change in scale and integration in critical information-based systems. This leads to many concerns, the most important of which are:
-
The inability to define and predict interactively complex behaviors of highly
integrated, large-scale systems that are life critical, safety critical,
enterprise critical, or national security critical.
-
The growing exposure of critical systems, previously operated in a closed
environment, to an open, widely interconnected environment that is much less
predictable and increasingly hostile.
-
Increasing societal risks resulting from unanticipated and often unpredictable
behaviors in large-scale, highly integrated, critical information-based systems.
We can no longer accurately predict the potential consequences of misbehavior in
such systems.
Pressures to achieve cost-saving or profit-enhancing system efficiencies through
larger scales of integration are outrunning our engineering capabilities to
design and assure predicable behaviors in complex systems. The tools and
techniques in use today are primarily products of 1960s fundamental research that
was focused on smaller-scale information technology components and federated
approaches to system design, and are no longer adequate. The research results we
purchased back then have long been used up; new fundamental results are needed.
This exploration of the HCS problem attempts to balance both the
enabling-capability and the risk-oriented views. The danger of over-dramatizing
the risk perspective is that such characterizations become largely ignored in the
absence of catastrophic events that demonstrate such risks to society at large.
The danger of overselling the enabling opportunities resulting from new
system-level engineering approaches is that such approaches rarely achieve
significant changes in the short term and often take much longer than expected.
Regardless of the motivation, today's technical shortcomings are in engineering
increasingly complex systems.
High confidence systems are those systems whose consequences of undesired
operational behavior are great. Use of the term "high" implies degrees
of confidence can be established. We want to accurately predict with a high
degree of confidence how the systems will behave, rather than calculate from
historical accident records how the systems did behave. What yardsticks do we
have? These yardsticks may include degree of predictable behavior, freedom from
surprise (the dual of predictability), tolerance of anomalous situations (bad
inputs, bad states), and accumulation of evidence (both from the system
development process and its operational behavior over time). Today, we largely
depend on "good" engineering practices and sometimes collect data on
those practices that work. However, process attributes do not directly inform
one as to system operational properties. We need measures related to system
requirements and design -- factors that requirements and designs must address
(e.g., enforceable behaviors, measures of development defects).
|
|
National Challenges for HCS
|
Catastrophic events have not been sufficient to gain the national attention
that is needed to foster high confidence research and development. For example,
Intel's Pentium problem was costly and gained some attention, but addressing the
problem remained relatively local to Intel and largely limited to better debugging
that occurs late in the product life cycle. The catastrophic failure of the
Arianne-5's first flight test was attributed to software, but received relatively
little attention in the U.S.
We must articulate a set of challenges that will establish a national imperative
for achieving high confidence in systems. The workshop participants discussed an
approach that would:
-
State the desired outcomes in meeting the challenge.
-
Develop a way to measure the outcomes with associated milestones.
-
Identify the technical breakthroughs necessary to meet the challenge.
-
Identify the strategies and barriers to meeting the challenge.
-
Identify what industry will do on its own and what government must do for the
public good.
In establishing this national imperative, we must keep in mind that we are seeing
a growing dependence on computing throughout industries that have life-critical
and safety-critical responsibilities. A new national sense of urgency must be
brought to the issues of high confidence systems due to this increased use and
rate of integration of computing and networking components in medicine, and in the
transportation, financial, and energy industries. We want higher-confidence for
existing applications while also being able to generate new capabilities such as
pilotless aircraft and telesurgery.
Just as critical is the need to prevent the collapse of essential but saturated
infrastructure systems such as air traffic control and an aging power grid
designed for the early 1970s. A number of major systems that have been canceled
-- the FAA's Advanced Automation System, the military's World-Wide Military
Command & Control System (WWMCCS) Automation Modernization Program (WAM), the
IRS Tax Modernization System, for example. Such cancellations are an indication
of our inability to deal with the complexity of next-generation replacements for
aging infrastructure systems (see, for example, the book
Fatal Defect.14)
|
|
Outcome-Based Quantifiable Challenges
|
The workshop participants thought that HCS challenges should be stated in
terms of desired outcomes and associated milestones. Examples of desired outcomes
include higher transportation throughput with lower accident rates, or the
Internet with higher numbers of transactions and lower risks.
A set of domain-specific national challenges could serve a purpose similar to that
of the High Performance Computing national challenge problems and the goals
established by NASA (see example 1, below). One such challenge for NIH could be
to reduce the 28 percent medical error figure to 10 percent within five years.
Another challenge for the Automated Highways Project could be to increase safety
(by some factor) by getting the driver out of the loop while simultaneously
increasing the traffic throughput (by another factor) on the nation's highways.
Other challenges could include reducing fraud in financial systems by some factor;
or increasing the ability to do electronic commerce over public networks by some
factor. There is a spectrum of domains to address, from medicine to
transportation, and from safety-critical to general quality-of-life, with
ubiquitous computerized appliances. It is not necessary that we know how to
achieve solutions to these challenges now. Rather, the challenges should help
identify the technical breakthroughs that are necessary to achieve them. We will
need a way to measure the outcomes.
The outcome-based articulation gives a picture of the world that is significantly
different from that of today (e.g., pilotless commercial aviation). We need
challenges that let us envision how to change the world. Like the HPC grand
challenges, this articulation should appeal to the general public who will pay for
this research. We must be wary of articulating challenges that generate fear
(e.g., a pilotless airplane is a scary idea, as are self-driving cars).
We should also stress affordability challenges. Methods to achieve high
confidence systems can potentially lead to less costly technology while improving
their operational efficiency. For example, in the 1996 Information Technology
Management Reform Act (ITMRA), such goals include a Sense of Congress provision
that "Executive Agencies should achieve a 5% per year decrease in the cost
incurred by the agency for operating and maintaining information technology, and a
5% per year increase in the efficiency of the agency operations."
We need to establish strategies for each articulated challenge. What are the
barriers to achieving that challenge? What research is required? We will probably
want a combination of focused research with relatively near-term results and
longer-term, higher risk research with potential for high payoff.
Identifying nearer-term technical milestones makes progress on these challenges
measurable. One example that participants discussed was to reduce software
defects by a factor of n in x years -- in particular, requirements defects by a
factor of two in five years; specifications defects by a factor of five in ten
years; and similarly for design, coding, and testing. Of course, this is not a
simple set of objectives. Each one leads to many additional issues. For example,
in reducing requirements defects by 50 percent:
-
How should one count errors of omission?
-
How are such errors detected?
-
Can such errors be masked?
-
How is completeness assessed?
-
Are High Confidence relevant requirements uniquely identifiable and separable
from other system requirements?
Example 1: NASA's Aeronautics Enterprise. One successful example of a
challenge-setting undertaking is NASA's Aeronautics Enterprise.16
NASA's strategy is to exploit digital technology to increase aviation safety.
NASA is developing an outcome-based research agenda across multiple industries and
agencies (e.g., FAA and DoD), identifying industry partners to absorb the
developed technology. The research agenda is intended to contribute to ten
long-term, outcome-oriented challenges that NASA has established for the next
twenty years in aeronautics. These are national challenges with a relevance that
can capture the attention of the public, including:
-
Reduce the cost of air travel by 25 percent in ten years and 50 percent in twenty years
-
Reduce the aircraft accident rate by 80 percent in ten years and 90 percent in twenty years
All ten NASA challenges are designed to "stretch" technology. For some
of the challenges, NASA does not know how they will actually be solved. But the
point is to create a challenge that will pull the research along. For example,
meeting some of the challenges involving dramatic reduction in coast-to-coast air
travel times may require eliminating the pilot. This means that there must be a
change in the way flight crews interact with the aircraft. This leads to new
behavior requirements for computing and networking associated with commercial
aviation.
Example 2: The NGI Initiative. The President's Next Generation
Internet (NGI) initiative is intended to lead into "terabit-per-second network
speeds over wide area advanced capability networks." There are three goals for the
NGI: (1) Experimental Research for Advanced Network Technologies, (2) Next
Generation Network Testbed, and (3) Revolutionary Applications. In the NGI
implementation plan (IP) many performance metrics are defined within each of the
three NGI Goals. Examples of these include:
-
Demonstrate 25 percent utilization improvement in the 100 node, 100+ Mbps
demonstration network (Goal 2.1) over 3 months and 100 percent utilization
improvement in the 10 node, 1+ Gbps demonstration network (Goal 2.2) over
three months.
-
Demonstrate 15-msec response capability.
The above are not directly tied to applications. To plan the development of NGI
applications, the NGI IP identified key "affinity groups." The NGI IP placed
"disciplinary affinity groups" (e.g., health care, crisis management, basic
science) in the rows of a matrix and within them looked for enabling
"applications technology affinity groups" common across those rows (e.g.,
collaboration technologies, distributed computing, and privacy and security) that
would support those disciplinary groups. It is through these affinity groups that
additional applications measures will be developed. The disciplinary affinity
groups would identify applications within the scope of that group that require
NGI bandwidth and services; especially those applications that, while impossible
without NGI, would improve mission success using NGI technologies. This leads to
the following questions for HCS:
-
What are the most appropriate ways to define and measure HCS contributions
(technology specific, application specific)?
-
How can we map HCS success down to more technical dimensions that can be measured?
For example, could we have the top of a matrix have security, safety, reliability,
and functional correctness?
-
What are the significant HCS performance measures (challenges) we can demonstrate
or target? Can we show a reduction in the cost of producing high confidence
software by 50 percent while increasing the safety and/or security by some factor?
Could we develop product-based criteria that would reduce the cost per line of
code or per delivered function for field-certified systems by half without any
increase in safety related incidents?
|
|
Issues
|
There are many issues to address in meeting the challenge problems for high
confidence systems. One set of issues deals with process, including tools, while
another deals with the product. These two sets are tightly interrelated. A third
set deals with measurement. Table 1 lists the issues identified by the
participants.
Table 1. HCS Issues
I. Process Issues
-
Getting the system requirements right
-
System design. Systems engineering viewpoint: commonsense engineering principles
for safety, for dependability, for security. Learning from past mistakes and
incorporating this knowledge into reusable design principles.
-
Integration of different properties -- the "ilities."
-
Understanding core principles and practices common across the different
properties, and understanding the tradeoffs among properties.
-
Composition: dealing with component technologies. If we are mixing and matching
from a set of reusable components, how to get confidence in the parts, and how do
we get confidence in the composition?
-
Evolution. How to maintain high confidence in extensible systems,
user-modifiable systems, customizable systems, dynamically adaptable systems
-
Usability - treating the user as part of the system to reduce interface errors
-
User-centered design.
-
Coding standards and design practices
-
Tools:
-
Tools for considering the entire system
-
Tools to reduce/understand complexity.
-
Informal methods and lightweight tools.
-
Tools that support different problem solving strategies.
-
Tools tailorable to specific domain applications.
-
seek a general purpose underpinning (e.g., decision procedures) common to all.
-
package in a way usable across multiple domains. Different domain-specfic
front ends.
-
Fundamental research in term rewriting, decision procedures. Need breakthrough
research to address current brick walls, e.g., verification of programs that use
integer arithmetic. Need decision procedures that are fast and efficient
available in reusable form.
-
Testing tools and techniques
-
Specification-based testing. Develop test cases directly from specifications
automatically. Find ways of replacing some of the current extremely expensive
testing, structural tests. Reducing cost of testing. Alternatives to hardware
testing (because of the high cost of unit test.)
-
Modeling tools and techniques
-
Reducing the time and cost of certification. Incremental evaluation cost for
incremental change.
II. Product Issues:
-
Lack of a conceptual framework, models.
-
Establishing the boundaries of a system
-
No unintended side effects.
-
Commonalities may reduce certification cost
-
Improvements to new designs; establish well-known "safe" designs or architectures.
-
New designs for high confidence within the space of new system designs.
-
Introducing confidence into legacy systems.
III. Measurement Issues:
-
Assessing the effectiveness of processes and practices.
-
Metrics for degree of confidence in system.
-
National standards for certifying tools.
|
|
Endnotes
|

-
Ivar Peterson, Fatal Defect: Chasing Killer Computer Bugs.
-
See the NASA Aeronautics Enterprise Web site for more details:
http://www.hq.nasa.gov/office/pao/NASA/aeronautics.html.
|
|

|
|