# Section 3.1.2.2. Design of NDI Capability Demonstrations

NDI capability is typically
quantified through a capability demonstration program. The concept for such a demonstration is to
mimic the real inspection as closely as possible on representative specimens
that contain cracks of known sizes that span the range of increase of the POD(*a*) function. A comprehensive description for the execution of such a
demonstration program and the analysis of the resulting data is presented in
MIL-HDBK-1823 (see also Berens [1988] and Berens [2000]). The analysis of the data from an NDI
demonstration uses the maximum likelihood estimates of the parameters of the
POD(*a*) model and the asymptotic
properties of such estimates. This
subsection briefly reviews the design and execution of a generic capability
demonstration.

An NDI reliability demonstration
comprises the execution of a test matrix of inspections on a set of specimens
with known crack locations and sizes.
The inspection results, either *â*
or hit/miss, are then analyzed to estimate the parameters of the POD(*a*) function and the reliably detected
crack size_{ }for the inspection application. The specimens are inspected under a test protocol that simulates as closely as practical the actual
application conditions. Establishing
test protocols for eddy current,
fluorescent penetrant, ultrasonic and magnetic particle inspection systems
are discussed in MIL-HDBK-1823.

The objectives and costs of an NDI
demonstration determine the matrix of inspections to be performed. From the analysis viewpoint, there are two
major categories of concerns that must be
addressed in establishing the experimental design. These are: a) the generality of inferences that can be
made from the controlled and uncontrolled inspection and material parameters; and, b) the number and sizes of cracks and the
number of uncracked inspection sites in the specimens.

Controlled and Uncontrolled
Factors

To demonstrate capability for an
application, it is assumed that: a) the complete protocol for conducting the inspection is well defined for the
application; b) the inspection process is under control; and, c) all other factors which introduce variability in an
inspection decision are reasonably
representative of the application. The
representativeness of these other factors limits the scope of the POD(*a*)
characterization and is addressed by controlling the factors during the
inspection or by randomly sampling the factors to be used in the
demonstration. The methods of
accounting for these factors are important aspects of the statistical design of
the demonstration and significantly influence the statistical properties
of the estimates of the POD(*a*)
function parameters.

The
important classes of the factors that introduce variation in crack
detectability are:

a)
the inherent degree
of repeatability of the magnitude of the NDI signal response when a specific
crack is independently inspected many times with all controllable factors held
constant;

b) the
material and geometrical properties of the specimens and the differences in the
physical properties of cracks of nominally identical "size";

c) the
variation introduced by different hardware components in the inspection system;
and,

d) the
summation of all the human factors associated with the particular population of
inspectors that might be used in the application.

The effects of these factors are present in every NDI
reliability demonstration and they should be explicitly considered in the
design of the demonstration and the interpretation of the results.

Little can be done about the variation of the response to the
NDI excitation at the demonstration stage when inspections are repeated under
fixed conditions. This variation might
be reduced if the system was modified or better optimized but that is a
different objective. Repeat inspections
under identical conditions will provide a measure of the inherent variability
that is a lower bound on the variability to be expected in applications of the
system.

The character of the cracks in the structure being inspected
will have a significant influence on the
inspection outcome. There are two
elements of crack character that impact the demonstration: the physical characteristics of the specimens
containing the cracks and the physical properties of the cracks in the
specimens. The inspection system will
be designed to detect cracks of a defined size range at a location in a
structural element defined at least by a material type and geometrical configuration combination. A fixed set of specimens containing cracks
will be inspected and these specimens either must be of this combination or the
assumption must be made that differences in inspection response in the
specimens is identical to that obtained in the real application.

The cracks in the specimens must be as close as possible to the
cracks that will be in the real structures and of sizes that span the region of
interest for the POD(*a*)
analysis. The assumption of equivalent response to the real inspection is implied
when the results of the demonstration are implemented. Experience with the inspection will dictate
the degree of acceptance of the assumption. For example, EDM notches are not good substitutes for eddy
current inspections of surface fatigue cracks but may be the only possible
choice for subsurface ultrasonic inspections.

Inspection capability is expressed in terms of crack size but
not all cracks of the same "size"
will produce the same magnitude of inspection response. In general, the specimens used in NDI
reliability demonstrations are very expensive to obtain and characterize
in terms of the sizes of the cracks in the
specimens. Each set of specimens will
be inspected multiple times if other factors are being considered in the
demonstration. From a statistical
viewpoint, this restriction on the experimental
design limits the sample size to the number of cracks in the specimen set. Multiple independent inspections of the same
crack only provide information about the detection probability of that
crack and do not provide any information about the variability of inspection
responses between different cracks.
Stated another way, *k* inspections on *n* cracks is not equivalent to inspections of *n • k* different cracks, even if the inspections are totally
independent. The number and sizes of
cracks will be addressed later.

Accounting for the variability due to differences in inspection
hardware must first be considered in terms of the scope of the capability
evaluation. Each component of the inspection system can be expected to have some,
albeit small, effect on inspection response. The combinations of particular
components into sub-systems and complete inspection stations can also be
expected to influence the response.
Recognizing that individual hardware combinations might have
different POD(*a*) capabilities, a
general capability objective must be set.
Each combination can be characterized, each facility comprising many
combinations can be characterized, or many facilities can be
characterized. Ideally, the available
hardware combinations would be randomly sampled for the scope of the desired characterization and a weighted average of
responses would be used to estimate the POD(*a*) function. On a practical
level this is seldom done for ostensibly identical equipment. (Note that an
analogous problem exists when accounting for the human factors which will be
discussed in the following.) More
commonly, capability demonstrations are performed on combination of hardware
and the assumption is made that the characterization would apply to all combinations. That is, the POD(*a*) differences between combinations are assumed to be negligible.

The above is directed at a complete individual inspection
system (however defined), but the variability
of interchangeable components of a system can often be directly assessed. For example, experience has shown
that different eddy current probes produce different responses when all other
factors are constant. If a single probe
is used to demonstrate the capability of an
eddy current system, the estimated POD(*a*)
function applies to the relevant inspections using that probe. However, if the POD characterization is to
be used for in-service inspections using any such probe, an assumption is
required that the probe is representative of the entire population. If a larger demonstration is affordable, the
inspections could be performed using a random sample of probes from the
available population. The analysis
method must then account for the fact that multiple inspections of each crack
were made with the different probes.
The resulting characterization would better represent an inspection for
a randomly selected probe.

Accounting for the variation from more than one source is more
complex. Care must taken to ensure that
the multiple sources are balanced in the analysis of the data and that the
correct analysis procedures are used.
For example, in the early evaluations of an automated eddy current
system for turbine engine disks (the ECIS system for the ENSIP/RFC
applications), there was considerable interest in the inherent variability in
response from repeated, identical inspections and from different probes with
their associated re-calibration changes.
(Other factors were initially considered but were later ignored after it
was shown that they had no affect on POD(*a*)
for the system.) The specimen sets
would be inspected three times: twice with one probe and once with a second
probe. The data from the three
inspections, however, could not be combined in a single analysis since such an
analysis would skew the results toward the probe with double representation. Thus, one analysis would be performed to
estimate the inherent repeat variability and a second analysis would be
performed to estimate the probe to probe variation. The results would then be combined to arrive at the POD(*a*) function that accounted for both
sources of variation. It might be noted
in this context that the repeat variability was negligible as compared to the
variability that results from re-calibration and probe changes. The demonstration plan was later modified to better estimate the more significant
between probe variation by performing the
third inspection with a third probe.

Factorial-type demonstrations are an efficient approach to
simultaneously account for several significant factors. However, such demonstrations for more than a
couple of factors require many inspections of the specimen set. More sophisticated statistical experimental
designs might be employed but the actual choice of such a design and the
analysis of the data are driven by the specific objectives of a particular
experiment. Discussion of such designs
is beyond the scope of this discussion.

Human Factors

When the inspector plays a significant role in the find/no find
decision, he or she is an integral component of the NDI system. In such common inspection scenarios, human
factors can contribute significantly to the variability in inspection
results. In this context, human factors
refer to both the dynamic capabilities of individual inspectors and the user
friendliness of the inspection tools in the environment of the
application. Experiments have been
conducted to quantify some of the environmental effects of human factors and
data from some demonstration experiments have been interpreted in terms of the
level of training and experience of the inspectors (see, for example, Spencer
& Schurman [1994]). However, the
effects and interactions of human factors on inspection results have not been
characterized. Rather, to the extent
possible, NDI systems are automated to minimize the effect attributed to the inspector.

In a non-automated inspection, many human factors potentially
influence the inspection decision and they
cannot all be accounted for in a capability demonstration. At some level, the representative
inspection assumption will be required.
Given that the mechanical aspects of the NDI system and inspection
environment are held constant, differences between inspectors can cause a
biased capability characterization if ignored.
Again, the objective of the capability characterization must be stated in
advance. If each inspector is being
evaluated, a separate POD(*a*) function
for each is estimated. If a single POD(*a*) function is wanted for an entire
facility, the inspectors in the demonstration must be randomly sampled in
proportion to the percent of such inspections each performs. Alternatively,
inspectors might be categorized by, say, capability as implied by certification
level. A random sample of the
inspectors from each level could be selected to arrive at a composite POD(*a*) for the level and a weighted average
would be calculated based on the percent of inspections performed by each
level. An example of designing such a
demonstration is given in Hovey, et al. [1989]. Example results from the evaluation of a population of inspectors
can also be found in Davis [1988].