ISO/IEC 5152:2024
(Main)Information technology - Biometric performance estimation methodologies using statistical models
Information technology - Biometric performance estimation methodologies using statistical models
This document provides statistical methodologies to estimate false match rates (FMRs) from small biometric sample sets. This document intends to: - lay out a methodology for biometric performance estimation based on extrapolation using extreme value statistical models; - provide statistical methodologies to estimate FMRs of biometric verification systems; - be applicable to systems that include algorithms that produce likelihood dissimilarity or similarity scores; NOTE Throughout the document, if not otherwise specified, scores refer to similarity scores. - specify the methodology for data recording and result reporting; - introduce metrics for the estimated biometric performance. The following are not within the scope of this document. - Estimation of false positive identification rates for one-to-many implementations. - Estimation of false accept rates for verification transactions.
Technologies de l'information — Méthodologies d'estimation des performances biométriques à l'aide de modèles statistiques
General Information
- Status
- Not Published
- Publication Date
- 10-Jul-2024
- Technical Committee
- ISO/IEC JTC 1/SC 37 - Biometrics
- Drafting Committee
- ISO/IEC JTC 1/SC 37/WG 5 - Biometric testing and reporting
- Current Stage
- 6060 - International Standard published
- Start Date
- 11-Jul-2024
- Due Date
- 28-May-2024
- Completion Date
- 11-Jul-2024
Overview
ISO/IEC 5152:2024 specifies statistical methodologies to estimate biometric false match rates (FMRs) when only small non-mated sample sets are available. The standard applies extreme value theory (EVT) to extrapolate the tail of similarity or likelihood score distributions so evaluators can produce an extrapolated FMR and confidence interval even when observed false matches are rare or absent. The document covers methodology, data recording, reporting, and metrics for estimated biometric performance, while excluding one-to-many false positive identification rate estimation and false accept rate estimation for verification transactions.
Key Topics
- Extreme value statistical models: The standard introduces two EVT-based approaches - the generalized extreme value (rGEV) model (for r largest order statistics) and the generalized Pareto (GP) distribution (for tail modeling). These enable reliable extrapolation of score distributions beyond the empirical range.
- Estimation design and confidence: Sample design and choice of thresholds are guided by the target FMR and desired confidence interval. The standard explains trade-offs between sample size, confidence level and extrapolation uncertainty (for example, industry rules like the “rule of 30” illustrate sample requirements when using purely empirical methods).
- Model fitness and diagnostics: Evaluators are required to assess model fit using diagnostic plots (e.g., Q–Q plots) and model selection procedures to validate extrapolation results.
- Stratified analysis and demographic factors: Where subpopulations (kinship, demographics, health, occupation, etc.) materially affect extreme scores, the standard recommends stratified evaluation and reporting of sub-dataset extrapolations.
- Record keeping and reporting: Procedures for recording comparison scores, reporting one-to-one performance, and communicating extrapolated FMRs with confidence intervals are defined to ensure reproducibility and transparency.
Applications
This standard is practical for:
- Technology evaluations of biometric verification algorithms when collecting very large non-mated datasets is impractical.
- Scenario and operational evaluations where comparison scores are available but false match events are rare.
- Risk assessments and procurement specifications that require quantified estimates of rare false match behavior and associated confidence bounds.
Benefits include reduced data collection burden, well-founded rare-event probability estimates, and standardized reporting to facilitate comparison between systems.
Related Standards
- ISO/IEC 19795-1:2021 - Principles and framework for biometric performance testing and reporting. This is a normative reference for test design and reporting principles.
- ISO/IEC 2382-37 - Biometrics vocabulary and definitions used throughout biometric standards.
Keywords: biometric, false match rate, FMR, extreme value theory, generalized Pareto, generalized extreme value, extrapolation, small sample estimation, model diagnostics, reporting.
ISO/IEC PRF 5152 - Information technology — Biometric performance estimation methodologies using statistical models Released:14. 05. 2024
REDLINE ISO/IEC PRF 5152 - Information technology — Biometric performance estimation methodologies using statistical models Released:14. 05. 2024
Frequently Asked Questions
ISO/IEC 5152:2024 is a draft published by the International Organization for Standardization (ISO). Its full title is "Information technology - Biometric performance estimation methodologies using statistical models". This standard covers: This document provides statistical methodologies to estimate false match rates (FMRs) from small biometric sample sets. This document intends to: - lay out a methodology for biometric performance estimation based on extrapolation using extreme value statistical models; - provide statistical methodologies to estimate FMRs of biometric verification systems; - be applicable to systems that include algorithms that produce likelihood dissimilarity or similarity scores; NOTE Throughout the document, if not otherwise specified, scores refer to similarity scores. - specify the methodology for data recording and result reporting; - introduce metrics for the estimated biometric performance. The following are not within the scope of this document. - Estimation of false positive identification rates for one-to-many implementations. - Estimation of false accept rates for verification transactions.
This document provides statistical methodologies to estimate false match rates (FMRs) from small biometric sample sets. This document intends to: - lay out a methodology for biometric performance estimation based on extrapolation using extreme value statistical models; - provide statistical methodologies to estimate FMRs of biometric verification systems; - be applicable to systems that include algorithms that produce likelihood dissimilarity or similarity scores; NOTE Throughout the document, if not otherwise specified, scores refer to similarity scores. - specify the methodology for data recording and result reporting; - introduce metrics for the estimated biometric performance. The following are not within the scope of this document. - Estimation of false positive identification rates for one-to-many implementations. - Estimation of false accept rates for verification transactions.
ISO/IEC 5152:2024 is classified under the following ICS (International Classification for Standards) categories: 35.240.15 - Identification cards. Chip cards. Biometrics. The ICS classification helps identify the subject area and facilitates finding related standards.
You can purchase ISO/IEC 5152:2024 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.
Standards Content (Sample)
International
Standard
ISO/IEC 5152
First edition
Information technology —
Biometric performance
estimation methodologies using
statistical models
Technologies de l'information — Méthodologies d'estimation des
performances biométriques à l'aide de modèles statistiques
PROOF/ÉPREUVE
Reference number
© ISO/IEC 2024
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
PROOF/ÉPREUVE
© ISO/IEC 2024 – All rights reserved
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Symbols and abbreviated terms. 2
5 Conformance . 2
6 Details of estimation . . 2
6.1 Estimation of biometric performance based on extreme value theory .2
6.2 Estimation design .3
6.3 Generalized extreme value distribution .3
6.4 Generalized Pareto distribution .5
6.5 E valuation of the fitness of the model .7
6.6 Selection of rGEV and GP .8
6.6.1 Differences between the two methodologies .8
6.6.2 Features of the two methodologies .9
7 Performance metrics . 9
8 Record keeping . 10
9 Reporting estimation results . 10
9.1 Reporting one-to-one comparison performance .10
9.2 Reporting estimation results .10
9.3 Reporting form .11
Annex A (informative) Extreme value theory .13
Annex B (informative) Examples applied to multiple modality datasets to demonstrate the
validity of the methodology .18
Bibliography .25
PROOF/ÉPREUVE
© ISO/IEC 2024 – All rights reserved
iii
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical activity.
ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations,
governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of document should be noted. This document was drafted in accordance with the editorial rules of the ISO/
IEC Directives, Part 2 (see www.iso.org/directives or www.iec.ch/members_experts/refdocs).
ISO and IEC draw attention to the possibility that the implementation of this document may involve the
use of (a) patent(s). ISO and IEC take no position concerning the evidence, validity or applicability of any
claimed patent rights in respect thereof. As of the date of publication of this document, ISO and IEC had not
received notice of (a) patent(s) which may be required to implement this document. However, implementers
are cautioned that this may not represent the latest information, which may be obtained from the patent
database available at www.iso.org/patents and https://patents.iec.ch. ISO and IEC shall not be held
responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www.iso.org/iso/foreword.html.
In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 37, Biometrics.
Any feedback or questions on this document should be directed to the user’s national standards
body. A complete listing of these bodies can be found at www.iso.org/members.html and
www.iec.ch/national-committees.
PROOF/ÉPREUVE
© ISO/IEC 2024 – All rights reserved
iv
Introduction
This document provides a methodology for measuring the accuracy of biometric verification systems based
[1]
on the statistics categorized as the extreme value theory. The methodology is particularly useful when
estimating the false match rate with a relatively small sample set. The methodology is an alternative to
empirical accuracy measurement.
In order to measure the false match rate of biometric verification systems, evaluators need to prepare a
dataset with a sufficiently large number of non-mated attempts in order to observe a sufficient number
of false match cases for a reliable estimation of the false match rate. For highly accurate systems the
quantity of attempts required to test the false match rate is likely to be extremely large. As performance of
biometric verification systems improves dramatically, acquiring representative data of non-mated attempts
in sufficient quantity becomes increasingly difficult in terms of the time, cost and practicality of creating
datasets. Policy considerations that apply to biometric data collection and use can pose further constraints.
If no false match case is found within the evaluation samples, metrics based on statistics known as “the rule
of three” (as is defined in ISO/IEC 19795-1) are widely used in the biometric industry. However, the rule of
3 is only applicable when no false match case is observed within the tested sample set and do not give any
indication of the accuracy and confidence levels expected if more than zero false matches were tested. Only
if at least 30 false matches were observed, the “rule of thirty” applies, i.e. the true error rate is with 90 %
confidence within ± 30 % of the observed error rate.
In this document, two major statistical methods are introduced to estimate the false match rate with
a relatively small number of samples. Both methods are widely used in a variety of industries including
civil engineering, meteorology, hydrology and financial engineering. Both methods are proven to be highly
reliable techniques to estimate the probability of the occurrence of rare, extreme events such as maximum
wind velocity or tsunami heights. These statistical methods are applied to similarly rare events of false
match cases in biometrics and used to estimate the probability of occurrence of such cases if a larger non-
mated sample set is not available. The estimated false match rate is available in the form of cumulative
distribution function (CDF) and its interval of confidence.
This document defines procedures for extrapolating performance metrics in technology evaluations. These
procedures can also be applied in scenario evaluations and operational evaluations if comparison scores
are obtained. This document defines the methodology to be used by evaluators to reliably estimate the false
match rate in case of a limited number of false match cases or even no false match case at all. This document
does not address certification or conformance.
PROOF/ÉPREUVE
© ISO/IEC 2024 – All rights reserved
v
International Standard ISO/IEC 5152:2024(en)
Information technology — Biometric performance estimation
methodologies using statistical models
1 Scope
This document provides statistical methodologies to estimate false match rates (FMRs) from small biometric
sample sets.
This document intends to:
— lay out a methodology for biometric performance estimation based on extrapolation using extreme value
statistical models;
— provide statistical methodologies to estimate FMRs of biometric verification systems;
— be applicable to systems that include algorithms that produce likelihood dissimilarity or similarity scores;
NOTE Throughout the document, if not otherwise specified, scores refer to similarity scores.
— specify the methodology for data recording and result reporting;
— introduce metrics for the estimated biometric performance.
The following are not within the scope of this document.
— Estimation of false positive identification rates for one-to-many implementations.
— Estimation of false accept rates for verification transactions.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes
requirements of this document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 19795-1:2021, Information technology — Biometric performance testing and reporting — Part 1:
Principles and framework
ISO/IEC 2382-37, Information technology — Vocabulary — Part 37: Biometrics
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO/IEC 2382-37 and the following apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
extrapolated false match rate
extrapolated FMR
false match rate (FMR) that is estimated by using any statistical models such as those used in extreme
value theory
PROOF/ÉPREUVE
© ISO/IEC 2024 – All rights reserved
3.2
quantile-quantile plot
Q-Q plot
[2]
quantile-quantile comparison of two distributions, either or both of which may be empirical or theoretical
4 Symbols and abbreviated terms
CDF cumulative distribution function
FMR false match rate
FNMR false non-match rate
GEV generalized extreme value
GP generalized Pareto
MLE maximum likelihood estimation
PDF probabilistic distribution function
σ scale parameter
ξ shape parameter
5 Conformance
To conform to this document, a biometric performance test shall be executed and reported in accordance
with the requirements contained in Clauses 6 through 9.
6 Details of estimation
6.1 Estimation of biometric performance based on extreme value theory
Extreme value theory is used to properly estimate the tails of the distribution of comparison scores from
different data subjects. By appropriately capturing the tail of the comparison score distribution, it becomes
possible to extrapolate scores outside the observed score range, i.e. the score range larger than the maximum
empirical score.
The method of extracting the tail of the comparison score distribution differs depending on whether the
extreme value distribution model applied is a generalized extreme value distribution model (rGEV; the
limiting joint generalized extreme value distribution for the r largest-order statistic) or generalized Pareto
distribution model. More details on these models can be found in 6.3, 6.4 and Annex A.
The extreme value theory is applied to the extracted score set to estimate the optimum parameters of the
distribution, and the comparison score distribution is approximated.
The result approximated by the extreme value statistic is compared with the original comparison score
distribution using the diagnostic diagram. By extrapolating the estimated comparison score distribution, a
distribution can be obtained for a section having no score value. Finally, the extrapolated FMR is obtained
by setting a threshold value from the comparison score distribution.
This methodology can be applied if comparison scores are obtained. It can be applied not only in technology
evaluations, but also in scenario evaluations and operation evaluations as long as the comparison scores are
obtained.
If the scores are produced from a population where some subjects are genetically related, e.g. identical twins
or siblings, the probability distribution at the extreme value domain can be higher than that without such
PROOF/ÉPREUVE
© ISO/IEC 2024 – All rights reserved
related scores, which typically results in a small secondary peak. The extrapolated FMR methodology still
works for such score distributions reflecting the increase of the probability. The evaluator shall report the
estimated results together with the demographic information of the population.
If the dataset has an unintended peak(s) in the extreme value domain, and if these counts are not within an
ignorable error range, it can be appropriate to introduce stratified analysis. A single dataset consists of n
different subject groups. The distribution of the dataset will have synthesized characteristics of each group,
reflecting each statistical parameter (e.g. mean, variance and the number of samples). If the differences
between the groups are found to be statistically significant and the proportion of such groups cannot be
ignored, these groups may be separated into up to n sub-datasets and evaluated independently. While these
sub-datasets are dependent on the biometric modalities, they are typically characterized by the test crew
properties such as:
a) kinship,
b) human races and genders,
c) occupations,
d) health conditions,
e) other factors that reduce the uniqueness of the biometric features of interest.
The extrapolated FMR for each sub-dataset shall be computed in the same manner as described in 6.3
and 6.4. The details of the sub-datasets shall be reported in accordance with the requirements defined in
ISO/IEC 19795-1:2021, 12.1.
6.2 Estimation design
The sample size is determined considering the target FMR to measure and the accuracy of the estimation.
Measuring an FMR of 0,000 1 % ± 30 % with 90 % confidence takes 30 false matches in 30 million non-mated
comparisons (“rule of 30”; see ISO/IEC 19795-1). This typically means several thousand test subjects are
necessary to calculate the FMR. This number can be reduced by using appropriate statistical estimations,
and by accepting some errors.
Since the extrapolated FMR is estimated by using extreme values, it is always preferred to have a reasonably
large number of samples for better accuracy, with accuracy being quantified by a confidence interval giving
the best and the worst case FMR values for a certain confidence level. If the extrapolated FMR values are
obtained at multiple thresholds, the extrapolated FMR value with the narrower confidence interval is
regarded as more reliable.
As the confidence interval and the confidence level shall be reported, it is the evaluator who decides which
threshold score and its corresponding extrapolated FMR value to report. If FNMR is reported, the same
score threshold shall be applied.
6.3 Generalized extreme value distribution
The rGEV estimation is based on the generalized extreme value distribution model as described in Clause A.2.
The validation processes are as follows:
1) Determine n, the number of samples in a block.
n needs to be large enough for each block to contain some extreme values, i.e. large non-mated scores.
On the other hand, when choosing m, the number of blocks, the trade-off between m and n needs to be
considered, as the number of extreme samples per block decreases as m increases.
2) Specify the number of extreme values per block, r.
The r value determines the number of extreme samples extracted from each block and hence the
number of samples used for the estimation. For similarity scores, the largest r scores within the block
are used for estimation. For dis-similarity scores, the smallest r scores are used. The larger the value
PROOF/ÉPREUVE
© ISO/IEC 2024 – All rights reserved
for r, the more samples for estimation, which contributes to obtaining better fitness. On the other hand,
if r is too large, non-mated scores that cannot be regarded as extreme values can be included in the
samples for estimation, which deteriorate the fitness of the resulting estimated cumulative distribution
function (CDF). The typical r range to estimate extreme natural phenomena is from 1 to 5, which is also
applicable to extrapolated FMR estimation.
ˆ
3) Calculate the estimated parameter set (μσˆ,ˆ,ξ ) (refer to A.2) by the extreme samples obtained from
steps 1) and 2) using the maximum likelihood estimation (MLE) algorithm.
Based on the hypothesis that the true probability distribution function (PDF) belongs to the generalized
extreme value (GEV) distribution family, apply a maximum likelihood estimation algorithm to obtain
ˆ
ˆ ˆ
the parameter set (μσ, ,ξ ).
4) Draw the Q-Q plot and validate the fitness of the estimation.
A Q-Q plot is a graphical method of statistics that compares two probability distributions by plotting
them against each other. First, a set of quantile intervals is selected. A point (x, y) on the plot corresponds
to one (y coordinate) of the quantiles of the second distribution plotted against the same quantile of
the first distribution (x coordinate). Thus, the line is a parametric curve with parameters that connect
quantiles. If the two distributions being compared are similar, the point in the Q-Q plot is near the line
y = x. Figure 1 shows an example of a Q-Q plot.
5) Observe the quantile-quantile plot to evaluate the fitness of the estimated model, especially at the
extreme value domain (the top-right corner of the plot).
Key
X quantile of rGEV distribution
Y quantile of empirical data
Figure 1 — Example of Q-Q plot of rGEV distribution and empirical data
6) Compare the estimated model and the empirical samples.
Draw the estimated CDF curve by using the GEV model and the parameters obtained in step 3) with 95 %
interval of confidence on both sides. Plot the empirical samples on the same plane and compare the fitness
of the model, especially the feasibility of the model in the domain where no empirical samples are available.
7) Check the feasibility of the model.
If the fitness of the estimation is good enough (refer to 6.5), go to step 8). Otherwise, go back to step 2) with
a different r value. If no r value gives a good estimation, go back to step 1) and try with a different number of
samples in a block, n.
8) Obtain the extrapolated FMR from the estimated 1-CDF curve. See Figure 2 and Annex B.
PROOF/ÉPREUVE
© ISO/IEC 2024 – All rights reserved
Choose a point of interest in the estimated 1-CDF curve and report the extrapolated FMR with m, n, r and the
interval of confidence at the extrapolated FMR level.
Key
X score
Y log(extrapolated FMR)
1 empirical data (all)
2 empirical data (estimation)
3 rGEV estimation
4 max estimation sample
Figure 2 — Comparison of the empirical 1-CDF and the extrapolated false match rate
6.4 Generalized Pareto distribution
The GP estimation is based on the GP model as described in Clause A.3. The validation processes are as
follows.
1) Determine μ , the location parameter for the GP.
Upon applying the GP model, it is necessary to find an appropriate threshold value µ to extract the extreme
scores from the entire score set. The appropriate µ can be obtained by observing the stability of the scale
parameter σ and the shape parameter ξ. Figure 3 shows the optimum shape parameters ξ obtained by MLE
for corresponding threshold values, plotted in y and x axis, respectively. The shape parameter ξ is regarded
as stable in the circled range and the appropriate threshold µ can be selected from the values close to the
lower end of the range. It is also possible to use the scale parameter σ vs threshold graph to find the optimum
μ in the same manner.
PROOF/ÉPREUVE
© ISO/IEC 2024 – All rights reserved
Key
X threshold
Y shape parameter
a
The values of shape parameter ξ are stable.
Figure 3 — Example of plot of the shape parameter ξξ
2) Estimate parameters (σ, ξ) of GP.
The GP distribution is fitted to the data exceeding the threshold μ, which is selected in the step 1). The scale
parameter σ and the shape parameter ξ are then estimated by using the maximum likelihood estimation
algorithm.
3) Diagnosis of GP model.
To diagnose whether parameters of an estimated GP are appropriate, Q-Q plots are commonly used for
extreme value theory. A diagnostic example using the Q-Q plot is shown in the Figure 4. Figure 4 shows an
example of the Q-Q plot between the empirical data and GP distribution.
PROOF/ÉPREUVE
© ISO/IEC 2024 – All rights reserved
Key
X quantile of GP distribution
Y quantile of empirical data
Figure 4 — Example of Q-Q plot of GP distribution and empirical data
4) Determine parameters of GP model.
If there is a problem with the diagnosis using Q-Q plot, the threshold µ is reselected. For example, in some
cases where there is a problem with the Q-Q plot, there can be a clear separation relative to y = x.
5) Obtain CDF of comparison scores.
The CDF of the non-mated comparison score is obtained as follows. A value less than the threshold µ is a
distribution obtained from the empirical values, and a value more than the threshold µ is a GP. The GP is
extrapolated to the extent that there are no
...
ISO/IEC DISPRF 5152:2023(E)
ISO/IEC JTC1 JTC 1/SC 37
Secretariat: ANSI
Date: 2023-07-102024-05-13
Information technology— — Biometric performance estimation
methodologies using statistical models
Technologies de l'information — Méthodologies d'estimation des performances biométriques utilisantà l'aide
de modèles statistiques
FDIS stage
ISO/IEC DISPRF 5152:2023(E2024(en)
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication
may be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying,
or posting on the internet or an intranet, without prior written permission. Permission can be requested from either ISO
at the address below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: + 41 22 749 01 11
EmailE-mail: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO/IEC 2023 – All rights reserved
© ISO/IEC 2024 – All rights reserved
ii
ISO/IEC DISPRF 5152:2023(E2024(en)
Contents
Foreword . iv
Introduction . v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Symbols and abbreviated terms . 2
5 Conformance . 2
6 Details of estimation . 3
6.1 Estimation of biometric performance based on extreme value theory . 3
6.2 Estimation design . 4
6.3 Generalized extreme value distribution . 4
6.4 Generalized Pareto distribution . 8
6.5 Evaluation of the fitness of the model . 11
6.6 Selection of rGEV and GP . 12
6.6.1 Differences between the two methodologies . 12
6.6.2 Features of the two methodologies . 13
7 Performance metrics . 14
8 Record keeping . 14
9 Reporting estimation results . 15
9.1 Reporting one-to-one comparison performance . 15
9.2 Reporting estimation results . 15
9.3 Reporting form . 15
Annex A (informative) Extreme value theory . 18
A.1 Fundamental premises . 18
A.2 Generalized extreme value distribution . 22
A.2.1 Preparation . 22
A.2.2 The rGEV Model . 22
A.3 Generalized Pareto distribution . 24
Annex B (informative) Examples applied to multiple modality datasets to demonstrate the
validity of the methodology . 26
B.1 Overview . 26
B.2 Datasets and test protocol . 26
B.3 Application examples . 26
Bibliography . 38
© ISO/IEC 2024 – All rights reserved
iii
© ISO/IEC 2023 – All rights reserved iii
ISO/IEC DISPRF 5152:2023(E2024(en)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are members
of ISO or IEC participate in the development of International Standards through technical committees
established by the respective organization to deal with particular fields of technical activity. ISO and IEC
technical committees collaborate in fields of mutual interest. Other international organizations, governmental
and non-governmental, in liaison with ISO and IEC, also take part in the work.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types of
document should be noted. This document was drafted in accordance with the editorial rules of the ISO/IEC
Directives, Part 2 (see www.iso.org/directives or www.iec.ch/members_experts/refdocs).
ISO and IEC draw attention to the possibility that the implementation of this document may involve the use of
(a) patent(s). ISO and IEC take no position concerning the evidence, validity or applicability of any claimed
patent rights in respect thereof. As of the date of publication of this document, ISO and IEC had not received
notice of (a) patent(s) which may be required to implement this document. However, implementers are
cautioned that this may not represent the latest information, which may be obtained from the patent database
available at www.iso.org/patents and https://patents.iec.ch. ISO and IEC shall not be held responsible for
identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www.iso.org/iso/foreword.html.
In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 37, Biometrics.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html and www.iec.ch/national-
committees.
iv © ISO/IEC 2023 – All rights reserved
© ISO/IEC 2024 – All rights reserved
iv
ISO/IEC DISPRF 5152:2023(E2024(en)
Introduction
This document provides a methodology for measuring the accuracy of biometric verification systems based
[1 [1]]
on the statistics categorized as the extreme value theory. . The methodology is particularly useful when
estimating the false match rate with a relatively small sample set. The methodology is an alternative to
empirical accuracy measurement.
In order to measure the false match rate of biometric verification systems, evaluators need to prepare a
dataset with a sufficiently large number of non-mated attempts in order to observe a sufficient number of false
match cases for a reliable estimation of the false match rate. For highly accurate systems the quantity of
attempts required to test the false match rate is likely to be extremely large. As performance of biometric
verification systems improves dramatically, acquiring representative data of non-mated attempts in sufficient
quantity becomes increasingly difficult in terms of the time, cost and practicality of creating datasets. Policy
considerations that apply to biometric data collection and use can pose further constraints.
If no false match case is found within the evaluation samples, metrics based on statistics known as “the rule of
three” (as is defined in ISO/IEC 19795-1) are widely used in the biometric industry. However, the rule of 3 is
only applicable when no false match case is observed within the tested sample set and do not give any
indication of the accuracy and confidence levels expected if more than zero false matches were tested. Only if
at least 30 false matches were observed, the "“rule of thirty"” applies, i.e. the true error rate is with 90 %
confidence within ± ± 30 % of the observed error rate.
In this document, two major statistical methods are introduced to estimate the false match rate with a
relatively small number of samples. Both methods are widely used in a variety of industries including civil
engineering, meteorology, hydrology and financial engineering. Both methods are proven to be highly reliable
techniques to estimate the probability of the occurrence of rare, extreme events such as maximum wind
velocity or tsunami heights. These statistical methods are applied to similarly rare events of false match cases
in biometrics and used to estimate the probability of occurrence of such cases if a larger non-mated sample
set is not available. The estimated false match rate is available in the form of cumulative distribution function
(CDF) and its interval of confidence.
This document defines procedures for extrapolating performance metrics in technology evaluations. These
procedures can also be applied in scenario evaluations and operational evaluations if comparison scores are
obtained. This document defines the methodology to be used by evaluators to reliably estimate the false match
rate in case of a limited number of false match cases or even no false match case at all. This document does not
address certification or conformance.
© ISO/IEC 2024 – All rights reserved
v
© ISO/IEC 2023 – All rights reserved v
DRAFT INTERNATIONAL STANDARD ISO/IEC DIS 5152:2023(E)
Information technology— — Biometric performance estimation
methodologies using statistical models
1 Scope
This document provides statistical methodologies to estimate false match rates (FMRs) from small biometric
sample sets.
This document intends to:
— — lay out a methodology for biometric performance estimation based on extrapolation using extreme
value statistical models;
— — provide statistical methodologies to estimate FMRs of biometric verification systems;
— — be applicable to systems that include algorithms that produce likelihood dissimilarity or similarity
scores;
NOTE Throughout the document, if not otherwise specified, scores refer to similarity scores.
— — specify the methodology for data recording and result reporting;
— — introduce metrics for the estimated biometric performance.
The following are not within the scope of this document.
— — Estimation of false positive identification rates for one-to-many implementations.
— — Estimation of false accept rates for verification transactions.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes
requirements of this document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 19795--1:2021, Information technology — Biometric performance testing and reporting — Part 1:
Principles and framework
ISO/IEC 2382--37, Information technology — Vocabulary — Part 37: Biometrics
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO/IEC 2382-37 and the following
apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— — ISO Online browsing platform: available at https://www.iso.org/obp
© ISO/IEC 2023 – All rights reserved 1
ISO/IEC DISPRF 5152:2023(E2024(en)
— — IEC Electropedia: available at https://www.electropedia.org/
3.1
extrapolated false match rate
extrapolated FMR
false match rate (FMR) that is estimated by using any statistical models such as those used in extreme value
theory
3.2
quantile-quantile plot
Q-Q plot
[2
quantile-quantile comparison of two distributions, either or both of which may be empirical or theoretical
[2]]
4 Symbols and abbreviated terms
CDF cumulative distribution function
FMR false match rate
GEV generalized extreme value
GP generalized pareto
PDF probabilistic distribution function
σ scale parameter
ξ shape parameter”
CDF cumulative distribution function
FMR false match rate
FNMR false non-match rate
GEV generalized extreme value
GP generalized Pareto
MLE maximum likelihood estimation
PDF probabilistic distribution function
σ scale parameter
ξ shape parameter
5 Conformance
To conform to this document, a biometric performance test shall be executed and reported in accordance with
the requirements contained in Clauses 6Clauses 6 through 99.
2 © ISO/IEC 2023 – All rights reserved
© ISO/IEC 2024 – All rights reserved
ISO/IEC DISPRF 5152:2023(E2024(en)
6 Details of estimation
6.1 Estimation of biometric performance based on extreme value theory
Extreme value theory is used to properly estimate the tails of the distribution of comparison scores from
different data subjects. By appropriately capturing the tail of the comparison score distribution, it becomes
possible to extrapolate scores outside the observed score range, i.e. the score range larger than the maximum
empirical score.
The method of extracting the tail of the comparison score distribution differs depending on whether the
extreme value distribution model applied is a generalized extreme value distribution model (rGEV; Thethe
limiting joint generalized extreme value distribution for the r largest-order statistic) or generalized
paretoPareto distribution model. More details on these models can be found in 6.34.3, 6.4 and Annex A4.4,
respectively.
The extreme value theory is applied to the extracted score set to estimate the optimum parameters of the
distribution, and the comparison score distribution is approximated.
The result approximated by the extreme value statistic is compared with the original comparison score
distribution using the diagnostic diagram. By extrapolating the estimated comparison score distribution, a
distribution can be obtained for a section having no score value. Finally, the extrapolated FMR is obtained by
setting a threshold value from the comparison score distribution.
This methodology can be applied if comparison scores are obtained. It can be applied not only in technology
evaluations, but also in scenario evaluations and operation evaluations as long as the comparison scores are
obtained.
If the scores are produced from a population where some subjects are genetically related, e.g. identical twins
or siblings, the probability distribution at the extreme value domain can be higher than that without such
related scores, which typically results in a small secondary peak. The extrapolated FMR methodology still
works for such score distributions reflecting the increase of the probability. The evaluator shall report the
estimated results together with the demographic information of the population.
If the dataset has an unintended peak(s) in the extreme value domain, and if these counts are not within an
ignorable error range, it can be appropriate to introduce stratified analysis. A single dataset consists of n
different subject groups. The distribution of the dataset will have synthesized characteristics of each group,
reflecting each statistical parameter (e.g. mean, variance and the number of samples). If the differences
between the groups are found to be statistically significant and the proportion of such groups cannot be
ignored, these groups may be separated into up to n sub-datasets and evaluated independently. While these
sub-datasets are dependent on the biometric modalities, they are typically characterized by the test crew
properties such as:
a) a) kinship,
b) b) human races and genders,
c) c) occupations,
d) d) health conditions,
e) e) other factors that reduce the uniqueness of the biometric features of interest.
© ISO/IEC 2024 – All rights reserved
© ISO/IEC 2023 – All rights reserved 3
ISO/IEC DISPRF 5152:2023(E2024(en)
The extrapolated FMR for each sub-dataset shall be computed in the same manner as described in 6.3 and
6.4the following clauses. The details of the sub-datasets shall be reported in accordance with the
requirements defined in ISO/IEC 19795-1:2021, 12.1.
6.2 Estimation design
The sample size is determined considering the target FMR to measure and the accuracy of the estimation.
Measuring an FMR of 0,000 1 % ± ± 30 % with 90 % confidence takes 30 false matches in 30 million non-
mated comparisons ("Rule(“rule of 30";”; see ISO/IEC 19795-1). This typically means several thousand test
subjects are necessary to calculate the FMR. This number can be reduced by using appropriate statistical
estimations, and by accepting some errors.
Since the extrapolated FMR is estimated by using extreme values, it is always preferred to have a reasonably
large number of samples for better accuracy, with accuracy being quantified by a confidence interval giving
the best and the worst case FMR values for a certain confidence level. If the extrapolated FMR values are
obtained at multiple thresholds, the extrapolated FMR value with the narrower confidence interval is regarded
as more reliable.
As the confidence interval and the confidence level shall be reported, it is the evaluator who decides which
threshold score and its corresponding extrapolated FMR value to report. If FNMR is reported, the same score
threshold shall be applied.
6.3 Generalized extreme value distribution
The rGEV estimation is based on the generalized extreme value distribution model as described in
Clause A.2Clause A.2. The validation processes are as follows:
1) 1) Determine n, the number of samples in a block.
n needs to be large enough for each block to contain some extreme values, i.e. large non-mated scores. On
the other hand, when choosing m, the number of blocks, the trade-off between m and n needs to be
considered, as the number of extreme samples per block decreases as m increases.
2) 2) Specify the number of extreme values per block, r.
The r value determines the number of extreme samples extracted from each block and hence the number
of samples used for the estimation. For similarity scores, the largest r scores within the block are used for
estimation. For dis-similarity scores, the smallest r scores are used. The larger the value for r, the more
samples for estimation, which contributes to obtaining better fitness. On the other hand, if r is too large,
non-mated scores that cannot be regarded as extreme values can be included in the samples for
estimation, which deteriorate the fitness of the resulting estimated cumulative distribution function.
(CDF). The typical r range to estimate extreme natural phenomena is from 1 to 5, which is also applicable
to extrapolated FMR estimation.
̂ ^
3) 3) Calculate the estimated parameter set (𝜇𝜇�,𝜎𝜎�𝜇𝜇^, 𝜎𝜎^,𝜉𝜉 𝜉𝜉) (refer to A.2A.2)) by the extreme samples
obtained from steps 1) and 2) using the maximum likelihood estimation (MLE) algorithm.
Based on the hypothesis that the true probability distribution function (PDF) belongs to the generalized
extreme value (GEV) distribution family, apply a maximum likelihood estimation algorithm to obtain the
̂ ^
parameter set (𝜇𝜇�,𝜎𝜎�𝜇𝜇^, 𝜎𝜎^,𝜉𝜉 𝜉𝜉).
4) 4) Draw the Q-Q plot and validate the fitness of the estimation.
4 © ISO/IEC 2023 – All rights reserved
© ISO/IEC 2024 – All rights reserved
ISO/IEC DISPRF 5152:2023(E2024(en)
A Q-Q plot is a graphical method of statistics that compares two probability distributions by plotting them
against each other. First, a set of quantile intervals is selected. A point (x, y) on the plot corresponds to
one (y coordinate) of the quantiles of the second distribution plotted against the same quantile of the first
distribution (x coordinate). Thus, the line is a parametric curve with parameters that connect quantiles. If
the two distributions being compared are similar, the point in the Q-Q plot is near the line y = = x.
Figure 1Figure 1 shows an example of a plot of a Q-Q plot.
5) 5) Observe the quantile-quantile plot to evaluate the fitness of the estimated model, especially at the
extreme value domain (the top-right corner of the plot).
Key
X quantile of rGEV distribution
Y quantile of empirical data
Figure 1 — Example of Q-Q plot of rGEV distribution and empirical data
6) 6) Compare the estimated model and the empirical samples.
Draw the estimated CDF curve by using the GEV model and the parameters obtained in step 3) with 95 %
interval of confidence on both sides. Plot the empirical samples on the same plane and compare the fitness of
the model, especially the feasibility of the model in the domain where no empirical samples are available.
7) 7) Check the feasibility of the model.
© ISO/IEC 2024 – All rights reserved
© ISO/IEC 2023 – All rights reserved 5
ISO/IEC DISPRF 5152:2023(E2024(en)
If the fitness of the estimation is good enough (refer to 6.56.5),), go to step 8). Otherwise, go back to step 2)
with a different r value. If no r value gives a good estimation, go back to step 1) and try with a different number
of samples in a block, n.
8) 8) Obtain the extrapolated FMR from the estimated 1-CDF curve. See Figure 2Figure 2 and Annex BAnnex
B.
Choose a point of interest in the estimated 1-CDF curve and report the extrapolated FMR with m, n, r and the
interval of confidence at the extrapolated FMR level.
6 © ISO/IEC 2023 – All rights reserved
© ISO/IEC 2024 – All rights reserved
ISO/IEC DISPRF 5152:2023(E2024(en)
© ISO/IEC 2024 – All rights reserved
© ISO/IEC 2023 – All rights reserved 7
ISO/IEC DISPRF 5152:2023(E2024(en)
Key
X score
Y log(extrapolated FMR)
1 empirical data (all)
2 empirical data (estimation)
3 rGEV estimation
4 max estimation sample
Figure 2 — Comparison of the empirical 1-CDF and the extrapolated false match rate
6.4 Generalized paretoPareto distribution
The GP estimation is based on the GP model as described in Clause A.3Clause A.3. The validation processes
are as follows.
1) Determine ,𝜇𝜇, the location parameter for the GP.
Upon applying the GP model, it is necessary to find an appropriate threshold value µ to extract the extreme
scores from the entire score set. The appropriate µ can be obtained by observing the stability of the scale
parameter σσ and the shape parameter ξ. Figure 3Figure 3 shows the optimum shape parameters ξ obtained
by MLE for corresponding threshold values, plotted in y and x axis, respectively. The shape parameter ξ is
regarded as stable in the circled range and the appropriate threshold µ can be selected from the values close
to the lower end of the range. It is also possible to use the scale parameter σσ vs threshold graph to find the
optimum μ in the same manner.
8 © ISO/IEC 2023 – All rights reserved
© ISO/IEC 2024 – All rights reserved
ISO/IEC DISPRF 5152:2023(E2024(en)
Key
X threshold
Y shape parameter
a
The values of shape parameter ξ are stable.
Figure 3 — Example of plot of the shape parameter 𝝃𝝃
2) 2) EstimatingEstimate parameters (σ, ξ) of GP.
The GP distribution is fitted to the data exceeding the threshold μ, which is selected in the step 1.). The scale
parameter σ and the shape parameter ξ are then estimated by using the maximum likelihood estimation
algorithm.
© ISO/IEC 2024 – All rights reserved
© ISO/IEC 2023 – All rights reserved 9
ISO/IEC DISPRF 5152:2023(E2024(en)
3) 3) Diagnosis of GP model.
To diagnose whether parameters of an estimated GP are appropriate, Q-Q plots are commonly used for
extreme value theory. A diagnostic example using the Q-Q plot is shown in the Figure 4Figure 4. Figure 4.
Figure 4 shows an example of the Q-Q plot between the empirical data and GP distribution.
Key
X quantile of GP distribution
Y quantile of empirical data
Figure 4 — Example of Q-Q plot of GP distribution and empirical data
4) 4) DeterminingDetermine parameters of GP model.
If there is a problem with the diagnosis using Q-Q plot, the threshold µ is reselected. For example, in some
cases where there is a problem with the Q-Q plot, there can be a clear separation relative to y = = x.
5) 5) ObtainingObtain CDF of comparison scores.
10 © ISO/IEC 2023 – All rights reserved
© ISO/IEC 2024 – All rights reserved
ISO/IEC DISPRF 5152:2023(E2024(en)
The CDF of the non-mated comparison score is obtained as follows. A value less than the threshold µ is a
distribution obtained from the empirical values, and a value more than the threshold µ is a GP. The GP is
extrapolated to the extent that there are no actual measurements. Each is rescaled based on the ratio of the
number of scores in the measured value, and the CDF is combined to obtain the CDF of the non-mated
comparison scores.
Calculate the PDF using these parameters and extrapolate a range with no score.
6) 6) Obtain the extrapolated FMR from the estimated 1-CDF curve. See Annex BAnnex B.
Choose a point of interest in the graph drawn in the step 5) and report the extrapolated FMR with µ, σ and ξ.
Similarly, report an upper 95 % confidence interval of extrapolated FMR.
6.5 Evaluation of the fitness of the model
The fitness of the model shall be evaluated by using a Q-Q plot. The Q–-Q plot is a quantile-quantile comparison
between two distributions, i.e. the estimated data versus the empirical data in this context, and, therefore, the
plotted values always increase monotonically. If the two distributions are identical, the Q–-Q plot follows the
45 degrees line y = = x. Since the fluctuations of the top few scores are essentially large, the plots for those
scores tend to deviate from the line y = = x. Therefore, it is important to observe the deviation between the
estimat
...














Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...