As noted there are two approaches to quantifying NDI capability
– fitting a model that expresses probability of detection as a function of
crack size and demonstrating a POD capability for a particular crack size. Data from the single crack size
demonstration approach are analyzed using a straightforward binomial
distribution analysis. Fitting a POD(a) model to the results of an NDI
demonstration depends on the nature of the data (hit/miss or â versus a), the function chosen to represent POD(a), and the method for fitting the parameters of the function and
determining the confidence bound on the reliably detected crack size. Experience with â versus a data from eddy
current inspections has shown that a cumulative normal equation provides a
reasonable model for the POD(a)
function when transformations of crack size or inspection signal response are considered.
Further, Berens and Hovey [1981], showed that the lognormal cumulative
distribution provided as good or better a model than the eight others
that were considered. Accordingly, the
Air Force has generally adopted the cumulative normal distribution function as
the model for POD(a) analyses. Note that the cumulative lognormal model is
the cumulative normal model after crack size is transformed. The log odds equation is also often used to
fit NDI data. The log odds equation and
the cumulative lognormal equation are essentially indistinguishable.
A computer program, POD Version 3, is recommended by
MIL-HDBK-1823 for the analysis of both â
versus a and hit/miss POD(a) analyses (see also Berens [2000]).
The program calculates the maximum likelihood estimates of the cumulative
normal model as well as confidence bounds on estimates of ap. The program permits transformations of the data. Since
the default analysis is based on the natural
logarithm transformation, the default analysis is for the cumulative lognormal
POD(a) function. In POD Version 3,
data are input through an Excel spreadsheet and output is provided as separate
tables and graphs in the spreadsheet.
The following paragraphs present a general description of the
analysis methods.
â Versus a Analysis
All NDE systems make find/no find decisions by interpreting the
response to an inspection excitation.
In some inspections, the response is a recordable metric, â, that is related to the flaw
size. Find/no find decisions are made
by comparing the magnitude of â to
the decision threshold value, âdec. The â
versus flaw size analysis is a method of estimating the POD(a) function based on the correlation
between â and flaws of known size, a.
The general formulation of the â
versus a model is expressed as
|
(3.1)
|
where f(a) represents the average (or median)
response to a crack of size a and d represents the sum of all the random
effects that makes the inspection of a particular crack of size a different from the average of all
cracks of size a. In principle, any f(a) and distribution of d
that fit the observations can be used. However, if f(a) is linear in a,
|
(3.2)
|
and d
is normally distributed with constant standard deviation, sd, then the
resulting POD(a) function is a
cumulative normal distribution function.
Monotonic transformations of â
or a can be analyzed in this
framework. In fact, the model has been
shown to fit a large number of cases in which a logarithmic transformation of
both a and â was applied.
As an example consider the
formulation of the â versus a analysis that has been used
exclusively in the evaluation of the
RFC/ENSIP automated eddy current inspection system. The relation between â
and a is expressed in terms of the
natural logarithms of â and a.
|
(3.3)
|
where d
is Normal (0, sd).
For a decision threshold of âdec,
|
(3.4)
|
where F(z) is the cumulative standard normal
distribution function and
|
(3.5)
|
|
(3.6)
|
The calculation is illustrated in Figure
3.1.5. The parameters of the â versus a model (B0, B1,
and sd) are estimated from the data of the demonstration
specimens. The probability density
function of the ln â values
for a 13 mil crack depth is illustrated in the figure. The decision threshold in the example is set
at âdec = 165. The POD for a randomly selected 13 mil crack
would be the proportion of all 13 mil cracks
that would have an â value greater
than 165, i.e. the area under the curve above 165. In this example, the decision threshold was
selected so that POD(13) = 0.90. The estimate of the POD(a) function and its 95 percent confidence bound for the decision
threshold of 165 counts is presented in Figure 3.1.6. It might be noted that when all cracks have
a recorded response between the signal minimum and maximum, the maximum
likelihood estimates are identical with those obtained from a standard
regression (least squares) analysis.
However, when crack response is below the signal minimum or above the
maximum (saturation level of the recorder), more sophisticated calculations are
required to obtain parameter estimates and the confidence bound. For complete details of the maximum
likelihood calculations and more discussion of the â versus a analysis, see
MIL-HDBK-1823, Berens [1988], and Berens [2000].
Figure 3.1.5. Example POD(a)
Calculation from â versus a Data
Figure 3.1.6. POD(a)
Function with 95 Percent Confidence Bound for an Example â versus a Analysis
The preceding
formulation of the â versus a model is based on three assumptions:
a) the
mean of the log responses, ln â, is
linearly related to log crack size, ln a;
b) the
differences of individual ln â
values from the mean response have a normal distribution; and,
c) the
standard deviation of the residuals, sd, is constant
for all a.
These assumptions
can be tested using the results of the data from the demonstration. When the
assumptions are not acceptable, current practice is to restrict the analysis to
a range of crack sizes for which the assumptions are acceptable.
These assumptions
can be easily checked and statistical tests for all three assumptions are built
into the standard analysis of the POD Version 3 computer program of
MIL-HDBK-1823.
If the ln â versus ln a relation is not linear, it may be possible to use other
transformations of either the signal response or the crack size. If the three assumptions are reasonably
valid for other transformations of the data, the above analysis can be applied
using the different transformation. The
inverse transformation of the results provides the answers in the correct
units. Data sets have been observed in
which no transformation was required and the fit was made directly to â versus a data (i.e. without the logarithmic transform). Other data sets have been analyzed in which
the three assumptions were acceptable when the analysis was performed in terms
of ln â versus 1/a. It should be noted that extreme caution must be exercised when
extrapolating the results beyond the range of crack sizes in the data. The POD Version 3 computer program has been
designed to perform the POD analyses using transformations other than the
logarithmic. The logarithmic transform
of both crack size and inspection response is the default transform.
Hit/Miss Analysis
The results of an inspection system are often recorded only as
a decision as to the presence (hit, find, or pass) or absence (miss, no find or
fail) of a crack. The available data
from the capability demonstration of such
inspections comprise data pairs of crack size and the inspection result. The parameters
of a POD(a) model for such data can
be estimated using maximum likelihood as follows:
Let ai represent the size of the ith crack and Zi
represent the result of the
inspection: Zi = 1 if the
flaw was found (hit) and Zi
= 0 if the flaw was not found (miss).
Assume POD(ai) is
the equation relating probability of detection to flaw size for the
inspection. The likelihood of obtaining
a specific set of (ai, Zi)
results when inspecting the specimens is
|
(3.7)
|
where q = (q1, q2, …, qk)
is a vector of the parameters of the POD(a)
function. Values of q1, q2, …, qk
are determined to maximize L(q). For typical
POD(a) models, it is more convenient
to perform the analyses in terms of logarithms.
|
(3.8)
|
The maximum likelihood estimates are given by the solution of
the k simultaneous equations:
|
(3.9)
|
In general, an iterative solution will be required to solve
Equations 3.9.
Any monotone increasing function between zero and one can be
used for POD(a). However, an early study of data with
multiple inspections per crack [Berens & Hovey, 1981] indicated that the
log odds or, equivalently, the cumulative lognormal models were more generally
applicable than the others investigated.
Further, the assumptions leading to a cumulative log normal model for
the POD(a) function for â versus a data have often been verified for eddy current data. The log odds and cumulative lognormal models
are equivalent in a practical sense in that the maximum difference in POD(a)
between the two for fixed location and scale parameters is about 0.02 which is
well within the scatter from repeated determinations of a POD(a) capability.
POD Version 3, the computer program recommended by
MIL-HDBK-1823, is based on a cumulative normal equation but allows
transformations of the crack size. The default transform of POD Version 3 is
the natural logarithm transform so that the program will fit the cumulative
lognormal equation by default. However, the program also provides a solution
based on the log odds equation. Other models
for the POD(a) function may be
appropriate but, if preferred, would require a different computer
implementation.
Repeating Equation 3.4, the cumulative log normal equation for
the POD(a) functions is:
|
(3.10)
|
where F(z) is the standard normal cumulative
distribution function. The log odds model for the POD(a) function is:
|
(3.11)
|
Equation 3.10 or 3.11 is substituted in Equations 3.7 through
3.9 for POD(a). and
are determined so as to maximize L(m,s), the likelihood of obtaining the observed
inspection results. Note that POD(m) = 0.5 for both models. s is a scale parameter that
determines the degree of steepness of the POD(a) function. A negative
value of s is not contradictory
but, for a negative s,
the POD(a) function will decrease
with increasing a.
There are occasions when Equations 3.9 do not converge. No solution will be obtained if the sizes of found cracks do not overlap with the
sizes of missed cracks. Little
information is obtained from cracks
that are so large they are always found or so small they are always
missed. More overlap is needed for the cumulative lognormal
model than for the log odds model. It
is also possible to obtain negative
estimates of s from erratic data sets. Results of this nature are due to the wrong range of crack sizes in the demonstration or to
an inspection process that is not under proper control. When the crack sizes in the specimens are not
in the range of increase of the POD(a)
function, the effective sample size is smaller and the effect is reflected in
larger standard deviations of the sampling distributions of the parameter
estimates and, thus, wider confidence bounds.
Damage tolerance analyses are driven by the single crack size
characterization of inspection capability for which there is a high probability
of detection. Typically, the one number
characterization of the capability of the NDE system is expressed in terms of
the crack length for which there is 90 percent probability of detection, a90. But a90
can only be estimated from a demonstration experiment and there is there is
sampling uncertainty in the estimate.
To cover this variability, an upper confidence bound can be placed on
the best estimate of a90. The use of an upper 95 percent confidence
bound, the a90/95 crack
size has become the de facto standard for this characterization of NDE
capability. The use of a90/95 is intended to be
conservative from the viewpoint of damage tolerance analyses.
In the hit/miss analysis of
POD Version 3 a single value of POD(a),
say 0.90, is selected and an upper confidence
bound, say 95 percent is calculated for the POD value. This procedure is known as a point by point
confidence bound. These are valid confidence bounds for any one POD value
but not for the entire POD(a) curve.
The confidence bounds for the estimates of a90 are calculated using the asymptotic normality
properties of the maximum likelihood estimates [Berens, 2000]. Figure 3.1.7
presents an example of a fit to hit/miss data from a semi-automated, directed
eddy current inspection.
Figure 3.1.7. Example POD(a) for a
Semi-Automated, Directed Eddy Current Inspection
Binomial Analysis for Cracks of Fixed Size
Because of the individual physical differences between cracks,
cracks of the same size will have different detection probabilities for a given
NDI system. However, a single POD for
all cracks of that size can be postulated in terms of the probability of
detecting a randomly selected crack from the population of all cracks of the
given size. In this formalism, the
proportion detected in a random sample of
the cracks is an estimate of POD for that size and binomial distribution theory
can be used to calculate a lower confidence bound on the estimate. Given a sample of inspection results from
cracks of a target size, say aNDI,
the inspection system is considered adequate if the lower confidence bound on
the proportion of detected cracks exceeds the desired POD value.
The theory of the binomial analysis is as follows. Given independent inspection results from
specimens containing n cracks of size
aNDI, the target reliably
detected crack size. Assume that r of the cracks are detected. If POD is the true (but unknown) probability
of detection for the population of cracks, the number of detections is modeled
by the binomial distribution. The
probability of r detections in n independent inspections of cracks of
size aNDI is:
|
(3.12)
|
The unbiased, maximum likelihood estimate of POD is
|
(3.13)
|
The 100(1-g)
percent lower confidence bound, PODCL, on the estimate of POD is
obtained as the solution to the equation:
|
(3.14)
|
The
interpretation of PODCL as a lower confidence bound is as
follows. If the demonstration was completely and independently repeated a large
number of times, 100(1-g) percent of the calculated lower bounds would be less than the
true value of POD. There is 100(1-g)
percent confidence that PODCL from a single demonstration will be
less than the true value.
Solutions to
Equation 3.14 are tabulated in Natrella [1963] for 90, 95, and 99 percent
confidence limits and selected sample sizes. General solutions expressed in
terms of the incomplete beta function and
the normal approximation to the binomial distribution can be found in many
statistical references, for example, Mood [1950]. Minimum values of n and r which yield predefined values of PODCL and confidence
level, 100(1-g),
are often quoted. Selected values can be found in Packman, et al. [1976].
For example,
consider a demonstration that there is 95 percent confidence that at least 90
percent of all cracks of size aNDI
will be detected by a given inspection system.
To achieve the desired level of confidence and POD would require results
as given in Table 3.1.2.
Table 3.1.2. Minimum Number of Detections Require to Conclude that
POD > 0.90 with 95 Percent Confidence
Number of Cracks
of Size aNDI
|
Number of Cracks
Detected
|
29
|
29
|
46
|
45
|
61
|
59
|
75
|
72
|
89
|
85
|
103
|
98
|
If there were 28
cracks in the demonstration and all 28 were detected, the lower 95 percent
confidence bound on the estimate of POD would be 0.899. If less than 28 were detected, the lower
confidence bound would be even lower.
Since the minimum number of specimens that can yield a 90 percent POD at
95 percent confidence is 29, this approach to capability demonstration has been
referred to as the “29 out of 29” method.
There are several
objections to the use of this approach to quantifying inspection capability:
1) This
demonstration approach to capability provides only minimal and reasonably gross
POD information for the single crack size used for the inspections. Steep POD(a) functions are generally considered superior to flat POD(a) functions and a single crack size
capability demonstration provides no information regarding POD(a) steepness.
2) Passing
or failing the demonstration provides no discrimination of degree of
detectability at the high POD levels.
For example, consider the 29 finds out of 29 cracks criterion for
demonstrating the 90/95 capability. If
the true POD is less than 0.9, there is up to a 5 percent chance that the
demonstration will conclude that the true POD is 0.9 or greater. Conversely, if the true POD is 0.995, there
is a 15 percent chance that at least one crack out of 29 will be missed and the
demonstration will fail to conclude that there is 95 percent confidence that
the POD is greater than 0.9. At POD =
0.976, there is about a fifty-fifty chance of concluding the POD is greater
than 0.9. POD(a) tends to be
relatively flat above 0.9 and there could easily
be a very large crack size difference between, say, a 0.9 capability and 0.995
capability. Even when crack
detection is absolutely certain for the given size, only a 90/95 capability can
be claimed after the demonstration.
3) When
attempting to demonstrate a 90/95 capability and one crack out of 29 is missed,
the demonstration must be repeated with at least additional 17 cracks. Since demonstrations are planned with the
expectation of meeting the criteria, the need for additional specimens can
create significant problems.
For these reasons,
quantifying inspection capability in terms of the entire POD(a) function has evolved as the preferred
method [MIL-HDBK-1823].
It might be noted
that attempts have been made to use a binomial approach to the analysis of
demonstration data comprising a range of crack sizes [Yee, et al., 1976]. These approaches have been generally
abandoned but a Bayesian approach to such analyses is being considered [Bruce,
1998].