Sample sizes in NDI reliability experiments are driven more by
the economics of specimen fabrication and crack characterization than by the
desired degree of precision in the estimate of the POD(*a*) function. Reasonable appearing
POD(*a*) functions can often be
obtained from applying the maximum likelihood analysis to an inspection of
relatively few specimens. Totally unacceptable results can also be
obtained from inspecting specimens containing too few cracks or from
inspection results that are not reasonably represented by the assumptions of
the models. Therefore, it must be
recognized that the confidence bound calculation for a POD(*a*) analysis is based on asymptotic (large sample) properties of the
estimates and that there are minimal sample size requirements that must be met
to provide a degree of reasonable assurance in the characterization of the
capability of the system.

Larger sample sizes in NDI
reliability experiments will, in general, provide greater precision in
the estimate of the POD(*a*)
function. However, the sample size is determined from the number of cracks in the
experiment and there is an information content coupling with the crack
sizes that must also be considered. The
effect of this coupling manifests itself differently for the *â* versus *a* and hit/miss analyses.

Sample sizes for the binomial analysis that is used to
demonstrate a capability at a single crack size are dictated strictly by the
selected value of the target POD and the degree of confidence.

Sample Size Requirements for *â* versus *a* Analysis

When the crack decision is
made on the basis of a recorded response, *â*,
to the inspection stimulus, the data are known as *â* versus *a* inspection
results and a better POD(*a*)
analysis is available. An example of *â* versus *a* data from a capability demonstration is presented in Figure 3.1.4.
When the inspection response is greater than a pre-set detection
threshold, a crack is indicated for the site.
In a capability demonstration, the minimum signal threshold is set as
low as possible with respect to noise.
Detection thresholds are later set that will yield a desired *a*_{90} value with an acceptable
rate of extra indications. Extra
indications are crack indications at sites with no known cracks. Extra indications
can be the result of noise or large responses from insignificant cracks. However, they can also result from
anomalies that do not impair structural integrity.

** **

** **

**Figure 3.1.4.** Example Plot of *â* versus *a* Data

The recorded signal
response, *â*, provides significantly
more information for analysis than a simple crack or no crack decision of a
hit/miss inspection response. The POD(*a*) model is derived from the correlation
of the *â* versus *a* data and the assumptions concerning the POD(*a*) model can be tested using the signal response data. Further, the pattern of *â* responses can indicate an acceptable range of extrapolation. Therefore, the range of crack sizes in the
experiment is not as critical in an *â*
versus *a* analysis as in a hit/miss
analysis. For example, if the decision
threshold in Figure 3.1.4 was set at 1000
counts, only the cracks with depths between about 6 and 10 mils would provide
information that contributes to the estimate of the POD(*a*) function. The larger and
smaller cracks are always found or missed and would have provided little
information about the POD(*a*) function
in a hit/miss analysis. In the *â* analysis, however, all of the recorded
*â* values provided full information
concerning the relation between signal response and crack size and the censored
values at the signal minimum and maximum limits provided partial
information. The parameters of the POD(*a*) function are derived from the
distribution of *â* values about the
median response for cracks of size *a*. Assumptions necessary for characterizing
this distribution are readily evaluated with the *â* versus *a* data.

Because of the added information in the *â* data, a valid characterization of the POD(*a*) function with confidence bounds can be obtained with fewer
cracks than are required for the hit/miss analysis. It is recommended that at least 30 cracks be available for
demonstrations whose results can be recorded in *â* versus *a* form. Increasing the number of cracks increases
the precision of estimates. Perhaps,
more importantly, increasing the number of cracks provides a broader population
of the different types of cracks that the inspection will address. Therefore, the demonstration specimen test
set should contain as many cracked sites as economically feasible. The analysis
will provide parameter estimates for smaller sample sizes but the adequacy of
the asymptotic distributions of the estimates is not known.

Sample Size Requirements for Pass/Fail Analysis

In a hit/miss capability demonstration, the inspection results
are expressed only in terms of whether or
not the crack of known size was detected.
There are detection probabilities associated with each inspection
outcome and the analysis assumes that the detection probability increases with
crack size. Since it is assumed that
the inspection process is in a state of control, there is a range of crack sizes over which the POD(*a*) function is rising. In this crack size range of inspection
uncertainty, the inspection system has limited discriminating power in the
sense that detecting or failing to detect
would not be unusual. Such a range
might be defined by the interval (*a*_{0.10},
a_{0.90}), where *a*_{p}
denotes the crack size that has probability of detection equal to *p*; that is, POD(*a*_{p}) = *p*. Cracks smaller than *a*_{0.10} would then be expected to be missed and cracks
greater than *a*_{0.90} would
be expected to be detected.

In a hit/miss capability demonstration, cracks outside the
range of uncertainty do not provide as much information concerning the POD(*a*) function as cracks within this
range. Cracks in the almost certain
detection range and almost certain miss range provide very little information
concerning probability of detection. In
the hit/miss demonstration, not all cracks convey the same amount of
information and the "effective" sample size is not necessarily the
total number of cracks in the experiment.
For example, adding a large number of
very large cracks does not increase the precision in the estimate of the
parameters of the POD(*a*)
function.

Ideally, all of the cracks in a hit/miss demonstration would
have 80 percent of their sizes in the (*a*_{0.10},
a_{0.90}) range of the POD(*a*)
function. However, it is not generally
possible to have a set of specimens with
such optimal sizes for all demonstrations.
The demonstrations are being conducted to determine this unknown
range of sizes for the NDI system being evaluated. Further, because of the high cost of producing specimens, the
same sets of specimens are often used in many different demonstrations. To minimize the chances of completely
missing the crack size range of maximum information and to accommodate the
multiple uses of specimens, the sizes of cracks in a specimen set should be uniformly
distributed between the minimum and maximum of the sizes of potential
interest. A minimum of 60 cracks should
be distributed in this range, MIL-HDBK-1823, but as many as are affordable
should be used. This minimum sample
size recommendation was the result of subjective considerations as to the
number needed to make the asymptotic assumptions reasonable, experience in
applying the model to data, and the results of analysis from a number of
simulated POD demonstrations [Berens & Hovey, 1981; Berens & Hovey,
1984; and Berens & Hovey, 1985].

Sample Size Requirements for Binomial Analysis

When capability is to be demonstrated by using specimens with
cracks of the same size and the binomial analysis, the number of cracks in the
specimens can be determined exactly from the POD level and the desired degree
of confidence. The best (maximum
likelihood) estimate of the POD at the crack length of interest is the
proportion of cracks in the specimen set that are detected. A lower bound on the estimate is then
calculated for the desired confidence level using binomial distribution
theory. For example, to demonstrate
that there is 95 percent confidence that at least 90 percent of all cracks of
the size under consideration will be detected requires at least 29 cracks of
that size. If all 29 cracks are detected,
the maximum likelihood estimate of POD is 1.0 and the lower 95 percent
confidence bound is slightly greater than 0.9.
If any crack is missed, the lower confidence bound on the estimate of
POD is less than 0.9. Sample sizes for
the binomial analysis will be discussed further in the subsection on analysis
methods.

It must be emphasized that the sample size is determined by the
number of different cracks, not the number of inspections. Different cracks can
respond differently to inspection stimuli. Multiple inspections of the same crack are not independent and,
therefore, cannot be treated as independent samples from the population of
cracks of the given size. There is a
tendency to re-inspect specimens to increase the sample size. For example, if one of 29 cracks is not
detected, the inspection does not qualify for an *a*_{90/95} capability at that size. The specimen set cannot be re-inspected with
the expectation of passing the test for a sample size of 58. New specimens with different cracks must be
used or the analysis is not valid.

Uncracked Inspection Sites

In the context of the preceding discussion, sample size refers
to the number of known cracks in the specimens to be inspected during the
capability demonstration. The complete
specimen set should also contain inspection sites that do not contain any known
cracks. If the inspection results are of the hit/miss nature, at least twice
as many uncracked sites as sites are recommended. The uncracked sites are necessary to ensure
that the NDI procedure is truly discriminating between cracked and uncracked
sites and to provide an estimate of the
false call rate. If the NDI system is
based on a totally automated *â* versus
*a* decision process, many fewer
uncracked sites will be required. If
any *â* values are recorded at the uncracked sites, their magnitude would provide
an indication of the minimum thresholds that might be implemented in the
application.