Comprehensive Chemometrics: Chemical and Biochemical Data Analysis 2nd Edition Steven Brown (Editor)

gisiinnami 6 views 62 slides May 20, 2025
Slide 1
Slide 1 of 62
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62

About This Presentation

Comprehensive Chemometrics: Chemical and Biochemical Data Analysis 2nd Edition Steven Brown (Editor)
Comprehensive Chemometrics: Chemical and Biochemical Data Analysis 2nd Edition Steven Brown (Editor)
Comprehensive Chemometrics: Chemical and Biochemical Data Analysis 2nd Edition Steven Brown (Edito...


Slide Content

Comprehensive Chemometrics: Chemical and
Biochemical Data Analysis 2nd Edition Steven
Brown (Editor) install download
https://ebookmeta.com/product/comprehensive-chemometrics-
chemical-and-biochemical-data-analysis-2nd-edition-steven-brown-
editor/
Download more ebook from https://ebookmeta.com

COMPREHENSIVE
CHEMOMETRICS:CHEMICAL
ANDBIOCHEMICAL
DATAANALYSIS
SECONDEDITION

This page intentionally left blank

COMPREHENSIVE
CHEMOMETRICS:CHEMICAL
ANDBIOCHEMICAL
DATAANALYSIS
SECONDEDITION
EDITORS IN CHIEF
Steven Brown
Department of Chemistry and Biochemistry
University of Michigan
USA
Romà Tauler
Department of Environmental Chemistry
Institute of Environmental Assessment and Water Research (IDÆA)
SPANISH COUNCIL OF SCIENTIFIC RESEARCH (CSIC)
Spain
Beata Walczak
Department of Analytical Chemistry
Institute of Chemistry
Silesian University
Poland
VOLUME 1

Elsevier
Radarweg 29, PO Box 211, 1000 AE Amsterdam, Netherlands
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom
50 Hampshire Street, 5th Floor, Cambridge MA 02139, United States
Copyrightfi2020 ELSEVIER B.V. All rights reserved
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including
photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on
how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the
Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website:www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted
herein).
Notices Knowledge and best practice in thisfield are constantly changing. As new research and experience broaden our understanding, changes in
research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers may always rely on their own experience and knowledge in evaluating and using any information, methods,
compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the
safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or
damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods,
products, instructions, or ideas contained in the material herein.
Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
ISBN 978-0-444-64165-6
For information on all publications visit our website
athttp://store.elsevier.com
Publisher: Oliver Walter
Acquisition Editors: Sean Simms & Rachel Conway
Content Project Manager: Paula Davies
Associate Content Project Manager: Brinda Subramanian
Designer: Mark Rogers

EDITORS IN CHIEF
Steven Brownobtained PhD in analytical chemistry from the University of Washington in 1978,
where he worked with Bruce Kowalski. That year he was appointed Assistant Professor at the University
of California, Berkeley, with a joint appointment at Lawrence Berkeley Laboratory. In 1981, he moved
to Washington State University, and in 1986 to the Department of Chemistry and Biochemistry at the
University of Delaware, where he is presently Willis F. Harrington Professor.
He has served as a Section President of the American Chemical Society from 1982–1984, as Chair
of the Department of Chemistry and Biochemistry at the University of Delaware from 1997–2002, and
as President of the North American Chapter of the International Chemometrics Society from 1986–
1988. He has served on the Editorial Staff of theJournal of Chemometricssince its inception,first as
a founding Editor and then as Editor in Chief before stepping down as Editor in Chief in 2007. He
now serves on the journal’s editorial board.
His research interests concern a wide range of problems in chemometrics and machine learning,
with over 300 publications in the scientific literature. He has edited several books, including thefirst
edition of the four-volume setComprehensive Chemometrics,published by Elsevier in 2009. A focus of
his research has been on the development of new instrumental methods through use of multivariate
mathematical methods for multicomponent analysis, including calibration transfer, and the novel
use of data fusion methods. He was winner of thefirst EAS Award in Chemometrics in 1996 and the Kowalski Award in 2015. His work
has had applications in biomedical analysis, food science, plant science, forensic science, pharmaceutical characterization, and process
chemistry.
Romà Tauler(Barcelona, Spain, 1955) is research professor at IDAEA-CSIC. He obtained his doctorate
in Analytical Chemistry at the University of Barcelona in 1977. He was Assistant and Associate Professor
of the University of Barcelona (1978–2002) and has been CSIC Research Full Professor since 2003. He
is Chief Editor of Journal ofChemometrics and Intelligent Laboratory Systemsand of the Comprehensive
Chemometrics major reference work. He has achieved an Award for Achievements in Chemometrics,
Eastern Analytical Symposium, and in 2009 the Kowalski Prize fromJ. of Chemometrics, Wiley, 2009.
He served as the President of the Catalan Chemistry Society, 2008–2013, and was the recipient of
the Eu-ERC Advanced Grant award Nr. 320337, 2013–2018, CHEMAGEB project. Romà Tauler
has published more than 430 research publications and 60 book chapters, with more than 16,000
citations and h-índex 63. His main researchfield is Chemometrics and its applications to
Environmental Chemistry, Omic Sciences, and Bioanalytical Chemistry.
v

Prof. Beata Walczakgraduated in chemistry from the Faculty of Mathematics, Physics and Chemistry,
Silesian University, Katowice, Poland, in 1979. Since then she has been working in the Institute of
Chemistry, Silesian University, where now she is the Head of the Department of Analytical Chemistry.
Meanwhile, she stayed as a postdoc at the University of Orleans (France) and at the Graz University of
Technology (Austria). She also held a post of a visiting professor at Vrije Universitiet Brussel
(Belgium), at Rome Univeristy“La Sapienza”(Italy), at AgroParisTech University (France), at the
University of Modena and Reggio Emilia(Italy), and at Radboud University (The Netherlands).
From the early 1990s she has been involved in chemometrics, and her main scientific interests are
in all aspects of data exploration and modeling (dealing with missing and censored data, dealing with
outliers, data representativity, enhancement of instrumental signals, signal warping, data compression,
linear and nonlinear projections, development of modeling approaches, feature selection techniques,
etc.). She authored and coauthored ca. 170 scientific papers, 400 conference papers, and delivered
many invited lectures at the numerous international chemistry meetings. She acted as an editor and
coauthor of the bookWavelets in Chemistry, Vol. 22, in the series“Data Handling in Science and Tech-
nology,”Elsevier, Amsterdam, 2000, and as an coeditor of four-volumeComprehensive Chemometrics,
Elsevier, Amsterdam, 2009.Currently she acts as Editor of the journalChemometrics and Intelligent
Laboratory Systemsand of“Data Handling in Chemistry and Technology”(the Elsevier book series), and also as a member of the editorial
boards ofTalanta,Analytical Letters, J. of Chemometrics, andActa Chromatographica.
viEditors in Chief

SECTION EDITORS
Richard Breretondid his BA, MA, PhD and postdoc in the University of Cambridge, after which time
he moved to the staff of the University of Bristol where he was successively Lecturer, Reader,
Professor and Emeritus. He is Fellow of the Royal Society of Chemistry, Royal Statistical Society
and Royal Society of Medicine and a Chartered Chemist. He is Editor-in-Chief of the journalHeritage
Science, columnist forJ Chemometrics, and editorial board member for several journals. He has given
around 200 invited lectures in over 30 countries, been main / sole supervisor of 33 PhD and 6
research masters students, on around 50 conference organising committees, and acted as expert
witness in 13 court cases. He has published over 400 recorded articles, including 8 books, 209 papers
in Web of Knowledge, 20 book chapters, 32 book reviews, 9 conference papers, almost all as main or
sole author. His papers have been cited over 5000 times according to Web of Knowledge, and his
books over 3000 times according to Google Scholar. He has published the most cited paper inJ Che-
mometricsover the last decade, and the eighth most cited paper in theAnalystsince 2000.
Marina Cocchicurrently serves as an associate professor in Analytical Chemistry-Chemometrics, at
the Department of Chemical and Geological Sciences of the University of Modena and Reggio Emilia
(Italy), teaching chemometrics at undergraduate and graduate levels. She holds a degree (cum
Laude) and a Ph.D. in Chemical Sciences from the University of Modena. As a part of his Ph.D.
he worked with Professor S. Wold on development of chemometrics approaches for 3D QSAR.
She has published more than 100 papers in international journals and books covering a range of
topics embracing Multivariate, Multi-way and Multiset methods; Data Fusion; 2D WT in Multivariate
Images Analysis for fault detection and pattern recognition; algorithms for features selection in
Wavelet Domain; MSPC; Food Authenticity; Chemicalfingerprinting by spectroscopy (MIR, NIR,
NMR) and chromatography.
She has supervised twelve PhD Thesis in chemometrics. She has been in the board of Italian Che-
mometrics Group from 2001 to 2015, acting as President in 2007-11. Since 2010 she has been
member of the editorial board of Chemometrics and Intelligent Laboratory Systems. She has been
Editor of the book Data Fusion: Methods and Applications, Data Handling in Science and Tech-
nology series, vol.31, Elsevier 2019.
Anna de Juanis an associate professor at the Department of Chemical Engineering and Analytical
Chemistry at the University of Barcelona since 2003, teaching chemometrics at undergraduate and
graduate levels. She holds a degree and PhD in Chemistry from the University of Barcelona (UB)
and her expertise is in Multivariate Curve Resolution (MCR) methods: theoretical development
and application to bioanalytical and analytical problems. Since 2002 she is member of the Editorial
Advisory Board of Chemometrics and Intelligent Laboratory systems and since 2006 of Analytica
Chimica Acta. Recently, it has acted as section editor for the reference work Comprehensive Chemo-
metrics, Elsevier (2009). She has stayed in the framework of research collaborations in Vrije Univer-
siteit Brussel, Brussels (1995), Virginia Commonwealth University, Richmond, US (1998), The
University of Newcastle, Australia (2002), Université des Sciences et Technologies de Lille, France
(2004) and the University of Dalhousie, Canada (2016). In 2004 she received the 4th Chemometrics
Elsevier Award together with Karl Booksh. She has published more than 130 papers in international
journals and books and has given more than 180 presentations in different international confer-
ences, 50 of them plenary or keynote lectures, basically on design of chemometric tools and multi-
variate curve resolution developments and related methods and on applications to process analysis,
hyperspectral image analysis and general analytical applications.
vii

Rafael Cela Torrijosis a professor of analytical chemistry in the University of Santiago de Compos-
tela, Spain, and the Head of Laboratory for Analytical Chemistry in the Research institute of Chem-
ical and Biological Analyses (IAQBUS), in the same university. Previously, he was at the Universities
of Madrid (Complutense) and Cádiz, belonging to the group of analytical chemists that started the
development of chemometrics by the 1980s in Spain. His research has focused on the analytical
applications of separation science, and particularly, the development and optimization of sample
preparation techniques in chromatographic analysis, including experimental designs and develop-
ment of computer-assisted chromatographic methods, being the author of the Mchrom Scout soft-
ware, distributed by Mestrelab Research S.L. Currently, he is the author or co-author of more than
300 scientific papers and several textbooks.
Riccardo Leardiwas born in Novi Ligure (Italy) on October 17, 1959.
In 1983 he graduated cum laude in Pharmaceutical Chemistry and Technology at the Faculty of
Pharmacy of the University of Genova..
His actual position is Associate Professor at the Department of Pharmacy of the School of
Medical and Pharmaceutical Sciences of the University of Genoa. In 2013 he got the qualification
for full professor in Analytical Chemistry.
Since 1985 he has been working in the section of Analytic Chemistry of the Department of Phar-
macy of the University of Genova, and his researchfield is Chemometrics.
His interests are mainly devoted to problems of classification and regression (applied especially
to food, environmental and clinical data), experimental design, process optimization, multivariate
process monitoring and multivariate quality control.
His original research focused mainly on genetic algorithms, especially in their application to the
problem of variable selection, and three-way methods.
He developed the chemometrical softwares CAT (Chemometric Agile Tool) and BasiCAT, both
freely downloadable from the site http://gruppochemiometria.it/index.php/software.
He is author of almost 150 papers and more than 130 communications in national and interna-
tional meetings, several of which as invited speaker; he was invited to give talks and courses in several
industries and research centers.
He organizes two schools of Chemometrics (Multivariate Analysis; Experimental Design), each
held twice a year at the University of Genoa.
Since November 2002 he started his activity of chemometric consultancy.
Prof. Federico Mariniwas born in Rome, Italy, in 1977 and he received his MSc (2000) and PhD
(2004) from Sapienza University of Rome. He is currently associate professor of Chemometrics at
Sapienza University of Rome. In 2006, he was awarded the Young Researcher Prize from Italian
Chemical Society and in 2012 he won the Chemometrics and Intelligent Laboratory Systems Award
“for his achievements in chemometrics”. He has been visiting researcher in various Universities
(Copenhagen, Stellenbosch, Silesia, Lille). His research activity is focused on all aspects of chemo-
metrics, ranging from the application of existing methods to real world problems in differentfields
to the design and development of novel algorithms. He is author of more than 150 papers in inter-
national journals, and recently he edited and coauthored the book Chemometrics in food chemistry
(Elsevier). He is member of the Editorial boards of Chemolab, Analytica Chimica Acta, J. of Chemo-
metrics, J. of NIR Spectroscopy, J. of Spectral Imaging and he serves as Associate Editor for Chemo-
metrics in Wiley’s Encyclopedia of Analytical Chemistry. He is the past coordinator of the
Chemometric group of the Italian Chemical Society and the coordinator of the Chemometric study
group of EUCheMS.
viiiSection editors

Alejandro C. Olivieriwas born in Rosario, Argentina (07/28/1958). He obtained his B.Sc. from the
Catholic University (1982), and his Ph.D. from the National University of Rosario (1986). He did
Postdoctoral Research at the University of Illinois, Urbana-Champaign, USA. In 1990, he returned
to the University of Rosario, and joined the National Research Council (CONICET). He founded
a research group in chemometrics in analytical chemistry, publishing about 250 papers in interna-
tional journals, books and book chapters, and supervised nine Ph.D. theses. He received several
national and international awards, including the John Simon Guggenheim Memorial Foundation
fellowship (2001–2002). In 2018 he published“Introduction to multivariate calibration. A practical
approach”, a book which was selected by Choice, from the Association of College and Research
Libraries (ACRL) as one of the Outstanding Academic Titles for 2019.
Dr. William Rayensis Professor and the Dr. Bing Zhang Endowed Department Chair in the Depart-
ment of Statistics at the University of Kentucky. Rayens has an extensive research record focused
primarily on the development of multivariate and multi-way statistical methodologies, mostly
related to problems in chemistry and the neurosciences. He has mentored several Ph.D. students
and has been honored at both the College and the University level as an outstanding teacher. Rayens
also served as Assistant Provost for General Education during which time he was tasked with imple-
menting new general education reforms at the University of Kentucky, thefirst changes to that
program in almost 30 years. He designed the one-of-a-kind Technologically Enhanced Active
Learning rooms in the University’s multi-million dollar Jacobs Science Building. Dr. Rayens received
his Ph.D. in mathematics from Duke University in 1986.
Luis A. Sarabiareceived his Ph.D. in Statistics from de University of Valladolid (Spain) in 1979.
Since 1974, he has been teaching Statistics, Mathematics and Design of Experiments mostly to grad-
uate and postgraduate students of Chemistry. In 2000 he reached the position of professor at the
University of Burgos. He is currently director of the Department of Mathematics and Computation
and a member of the research group Chemometrics and Qualimetrics (officially recognized as
consolidated research group, UIC-237). He is author and co-author of around 140 papers, 7 chapter
of book and approximately 160 communications in international meetings. He was co-founder of
Colloquium Chemiometricum Mediterraneum. His research interest are in multivariate statistics,
n-way procedures, evolutionary algorithms for multiresponse optimization, design of experiments,
QbD and PAT with application to regulated chemical analysis, food safety, food characterization and
fraud detection.
Section editorsix

This page intentionally left blank

IN MEMORY OF ROGER PHAN TAN LUU
“Science evolves by means of research. Scientific research is based on experiments (Galileo Galilei, Two new
sciences, 1638). With time experiments have becoming more and more complex. So scientists used“experi-
mental design (DOE, design of experiments)”as a useful tool. Only 40 years ago Roger Phan Tan Luu suggested
that DOE is the core of the Methodology of experimental research, much more than a tool, a philosophy.
The research work of Roger Phan Tan Luu includes new tools, the organization of design for complex
experiments, and a lot of applications. To perform research work on a real problem it is necessary to know very
well the problem. Obvious. When the expert of the problem asks for the help of the expert of design, the
problem expert must know just some base elements of the Methodology. The expert of Methodology instead
must study to have a deep knowledge of the problem, in much detail. In this way Roger accumulated knowl-
edge, i.e., experience, in manyfields of research. The result of nearly 50 years of activity was a complete scientist,
from theory to new tools, to applications.
A complete scientist is always a teacher. Roger was a great teacher, not only in his Aix-Marseille University, but
throughout the world. He had the rare ability to capture the attention of student, transforming a boring heavy
sequence of theorems into a compelling, instructive show. For the students a pleasure, not a pain.
This description is that of the public man.
The private man was (outside the family) a loner. Only in the last 20 years of the past century did he make
contact in the organization of schools (Eurochemometrics, Erasmus stages) of chemometrics and experimental
design for some people of large affinity. With these people, he opened himself to friendship. As one of his
friends, I received from him very much. It was possible for me to appreciate his generosity, his humor, his speed
to answer to invitation (and of course to work together).
Friendship details are very intimate, not ready to be described. For friends thefinal separation is a very
hard continuous sentiment. Memories help. As a friend, as a common scientist, always I will remember.
Michele Forina
xi

This page intentionally left blank

CONTRIBUTORS TO VOLUME 1
BW Bader
Sandia National Laboratories, Albuquerque, NM, USA
A Beal
NemrodW SAS, Marseille, France
JM Bernardo
Universitat de València, Valencia, Spain
Andrey Bogomolov
EndressþHauser Liquid Analysis GmbHþCo. KG,
Gerlingen, Germany; and Samara State Technical
University, Samara, Russia
Richard G Brereton
School of Chemistry, University of Bristol, Bristol,
United Kingdom
B Campisi
Department of Economics, Business, Mathematics and
Statistics, University of Trieste, Trieste, Italy
Johan E Carlson
Department of Computer Science and Electrical
Engineering, Luleå University of Technology, Luleå,
Sweden
Rolf Carlson
Department of Chemistry, Faculty of Science, University
of Tromsoe, Tromsoe, Norway
Georgia Charkoftaki
Department of Environmental Health Sciences, Yale
School of Public Health Yale University, New Haven,
CT, United States
Bieke Dejaegher
Analytical Chemistry, Applied Chemometrics and
Molecular Modelling, Vrije Universiteit BrusseleVUB,
Brussels, Belgium
Paul HC Eilers
Department of Biostatistics, Erasmus University Medical
Center Rotterdam, The Netherlands
SLR Ellison
LGC Limited, Teddington, Middlesex, United Kingdom
Jasper Engel
Biometris, Wageningen University & Research,
Wageningen, The Netherlands
KH Esbensen
KHE Consulting, Copenhagen, Denmark
Bernard Francq
Institute of Statistics, Biostatistics and Actuarial Sciences
(ISBA), Louvain Institute for Data Analysis and
Modeling (LIDAM), Université catholique de Louvain
(UCLouvain), Louvain-la-Neuve, Belgium; and
CMC Statistical Sciences, GSK Vaccines, Rixensart,
Belgium
Bernadette Govaerts
Institute of Statistics, Biostatistics and Actuarial Sciences
(ISBA), Louvain Institute for Data Analysis and
Modeling (LIDAM), Université catholique de Louvain
(UCLouvain), Louvain-la-Neuve, Belgium
Ana Herrero
Departamento de Química, Facultad de Ciencias,
Universidad de Burgos, Burgos, Spain
HCJ Hoefsloot
University of Amsterdam, Amsterdam, Netherlands
Kas J Houthuijs
Analytical Chemistry, Institute for Molecules and
Materials, Radboud University Nijmegen, Nijmegen,
The Netherlands
Mia Hubert
Department of Mathematics, KU Leuven, Leuven,
Belgium
JJ Jansen
Netherlands Institute for Ecology, Heteren, Netherlands
LP Julius
Glycom A/S, Esbjerg, Denmark
Riccardo Leardi
Department of Pharmacy, University of Genoa, Genoa,
Italy
xiii

Federico Marini
Department of Chemistry, University of Rome La
Sapienza, Rome, Italy
Rebecca Marion
Institute of Statistics, Biostatistics and Actuarial Sciences
(ISBA), Louvain Institute for Data Analysis and
Modeling (LIDAM), Université catholique de Louvain
(UCLouvain), Louvain-la-Neuve, Belgium
Manon Martin
Institute of Statistics, Biostatistics and Actuarial Sciences
(ISBA), Louvain Institute for Data Analysis and
Modeling (LIDAM), Université catholique de Louvain
(UCLouvain), Louvain-la-Neuve, Belgium; and Fond
National de le Recherche Scientifique, Brussels, Belgium
M Cruz Ortiz
Departamento de Química, Facultad de Ciencias,
Universidad de Burgos, Burgos, Spain
M Pavan
Joint Research Centre, European Commission, Ispra,
Italy
R Phan-Tan-Luu
NemrodW SAS, Marseille, France; and University Paul
Cezanne, Marseille Cedex, France
FF Pitard
Francis Pitard Sampling Consultants, Broomfield, CO,
USA
M Sagrario Sánchez
Departamento de Matemáticas y Computación, Facultad
de Ciencias, Universidad de Burgos, Burgos, Spain
Luis A Sarabia
Departamento de Matemáticas y Computación, Facultad
de Ciencias, Universidad de Burgos, Burgos, Spain
AK Smilde
University of Amsterdam, Amsterdam, Netherlands
Michel Thiel
Institute of Statistics, Biostatistics and Actuarial Sciences
(ISBA), Louvain Institute for Data Analysis and
Modeling (LIDAM), Université catholique de Louvain
(UCLouvain), Louvain-la-Neuve, Belgium; and
Statistics and Decision Sciences, Janssen
Pharmaceutical, Beerse, Belgium
M Thompson
Birkbeck College, University of London, London, United
Kingdom
R Todeschini
University of MilanoeBicocca, Milan, Italy
Rafael Cela Torrijos
CRLF University of Santiago de Compostela, Santiago,
Spain
Yvan Vander Heyden
Analytical Chemistry, Applied Chemometrics and
Molecular Modelling, Vrije Universiteit BrusseleVUB,
Brussels, Belgium
Vasilis Vasiliou
Department of Environmental Health Sciences, Yale
School of Public Health Yale University, New Haven,
CT, United States
DJ Vis
University of Amsterdam, Amsterdam, Netherlands
D Voinovich
Department of Chemical and Pharmaceutical Sciences,
University of Trieste, Trieste, Italy
Beata Walczak
Institute of Chemistry, University of Silesia, Katowice,
Poland
JA Westerhuis
University of Amsterdam, Amsterdam, Netherlands
xivContributors to Volume 1

SUBJECT CLASSIFICATION
Statistics
Quality of Analytical Measurements: Statistical Methods for Internal Validation
Proficiency Testing in Analytical Chemistry
Quality of Analytical Measurements: Univariate Regression
Robust and Nonparametric Statistical Methods
Bayesian Methodology in Statistics
Robust multivariate statistical methods
An Introduction to the Theory of Sampling: An Essential Part of Total Quality Management
Representative Sampling, Data Quality, Validation: A Necessary Trinity in Chemometrics
Experimental Design
Introduction: Experimental Designs
Screening Strategies
The Study of Experimental Factors
Response Surface Methodology
Experimental Design for Mixture Studies
Nonclassical Experimental Designs
Designing a multicomponent calibration experiment: basic principles and diagonal approach
Analysis of Variance
General Linear Models
Multiset Data Analysis: ANOVA Simultaneous Component Analysis and Related Methods
Regularized Manova
ANOVAdTP
Optimization
Constrained and Unconstrained Optimization
Sequential Optimization Methods
Optimization: Steepest Ascent, Steepest Descent, and Gradient Methods
Multicriteria Decision-Making Methods
Genetic Algorithms in Chemistry
A Guided Tour of Penalties
Particle Swarm Optimization
Linear Soft-Modeling
Linear Soft-Modeling: Introduction
Principal Component Analysis: Concept, Geometrical Interpretation, Mathematical Background, Algorithms, History, Practice
Principal Component Analysis
xv

Independent Component Analysis
Independent component analysis in Analytical Chemistry
Introduction to Multivariate Curve Resolution
Two-Way Data Analysis: Evolving Factor Analysis
Two-Way Data Analysis: Detection of Purest Variables
Two-Way Data Analysis: Multivariate Curve ResolutionjNoniterative Resolution Methods
Two-Way Data Analysis: Multivariate Curve ResolutionjIterative Resolution Methods
Two-Way Data Analysis: Multivariate Curve ResolutionjError in Curve Resolution
Estimation of feasible bands in Multivariate Curve Resolution
Multiway Data Analysis: Eigenvector-Based Methods
Multilinear Models: Iterative Methods
Multiset Data Analysis: Extended Multivariate Curve Resolution
Tensor Similarity in Chemometrics
Bayesian Methods for Factor Analysis in Chemometrics
Time Series Modeling
Other Topics in Soft-Modeling: Maximum Likelihood-Based Soft-Modeling Methods
Figures of Merit
Unsupervised Learning
Unsupervised Data Mining: Introduction
Data Mapping: Linear Methods versus Nonlinear Techniques
Tree-Based Clustering and Extensions
Model-Based Clustering
Common Clustering Algorithms
Density-Based Clustering Methods
Other Problems With Data Analysis
Feature Selection: Introduction
Feature Selection in the Wavelet Domain: Adaptive Wavelets
Missing Data
Compositional Data Analysis in Chemometrics
Sparse Methods
Data Preprocessing
Preprocessing Methods
Evaluation of Preprocessing Methods
Model-based preprocessing in vibrational spectroscopy
Normalization and Closure
Variable Shift and Alignment
Background Estimation, Denoising, and Preprocessing
Denoising and Signal-to-Noise Ratio Enhancement: Classical Filtering
Denoising and Signal-to-Noise Ratio Enhancement: Derivatives
Denoising and Signal-to-Noise Ratio Enhancement: Splines
Data Quality and Denoising: a Review
Model-Based Preprocessing and Background Elimination: OSC, OPLS, and O2PLS
Regression and Classification
Calibration Methodologies
Variable Selection
Partial Least Squares
Multivariate Approaches: UVE-PLS
Data Fusion
Multiblock and Three-Way Data Analysis
xviSubject Classification

Transfer of Multivariate Calibration Models
Robust Multivariate Methods in Chemometrics/Robust and Sparse Multivariate Methods in Chemometrics
Regression Diagnostics
Model-Based Data Fitting
Linear Approaches for Nonlinear Modeling
Computationally Intensive Nonlinear Regression Methods
Neural Networks
Feedforward Neural Networks
Kernel Methods
Classification: Basic Concepts
Validation of Classifiers
Statistical Discriminant Analysis
Soft Independent Modeling by Class Analogy
Decision Tree Modeling in Classification
Random Forest and Ensemble Methods
Multivariate Approaches to Classification Using Genetic Algorithms
Multiway Classification
Deep Learning Theoretical Chapter for chemometrician
Applications
Chemometrics in Electrochemistry
Chemometrics in the Pharmaceutical Industry
Environmental Chemometrics
Resampling and Testing in Regression Models with Environmetrical Applications
Application of Chemometrics in the Food Sciences
Chemometrics in Forensics
Chemometric Analysis of Sensory Data
Smart Sensors
Statistical Control of Measures and Processes
Best Practice and Performance of Hardware in Process Analytical Technology (PAT)dA Prerequisite to Avoid
Pitfalls in Data Analytics
Multivariate Statistical Process Control and Process Control, Using Latent Variables
Batch Process Modeling and MSPC
Chemometrics in Raman spectroscopy
Chemometrics in NIR Hyperspectral ImagingdTheory and Applications in the Agricultural Crops and Products Sector
Mass Spectroscopic Imaging: Chemometric Data Analysis
Fast analysis, Processing, and Modeling of Hyperspectral Videos: Challenges and Possible Solutions
Image Processing
Chemometrics Analysis of Big Data
Systems Biology
Analysis of Metabolomics Data
Data Processing for RNA/DNA Sequencing
Analysis of Megavariate Data in Functional Omics
Spectral Map Analysis of Microarray Data
Chemometrics in Flow Cytometry
Chemometrics for QSAR Modeling
Chemoinformatics
High-Performance GRID Computing in Chemoinformatics
Subject Classificationxvii

This page intentionally left blank

PREFACE
Some 50 years ago, thefirst publications appeared on the use of computer-aided mathematics to analyze
chemical data. With those publications, the modernfield of chemometrics was launched. Both the speed and
power of computers and the sophistication of analytical instrumentation have made great leaps in the inter-
vening time. The ready availability of chemometric software, coupled with the increasing need for rigorous,
systematic examination of ever-larger and more sophisticated sets of measurements from instrumentation has
generated strong interest in reliable methods for converting the mountains of measurements into more
manageable piles of results, and for converting those results into nuggets of useful information. Interest in
application of chemometrics has spread well beyond chemists with a need to understand and interpret their
measurements; now chemometrics is helping to make important contributions in process engineering, in
systems biology, in environmental science, and other disciplines that rely on chemical instrumentation, to name
only a few areas.
In the 12 years since thefirst edition of this book appeared, there has been considerable change in thefields
of data science and of chemometrics. As applications of chemometrics continue to grow, so too does is the
methodology of chemometrics. After 50 years, chemometrics is a scientificfield with mature areas, but it is also
afield where change continues to occur at a rapid pace, driven both by advances in chemical instrumentation
and measurement and by close connection of chemometrics with the data science, machine learning, statistics
and signal processing research communities. The interfacial location of chemometrics, falling between
measurements on the one side and statistical and computational theory and methods on the other, poses
a challenge to the new practitioner: gaining sufficient breadth and depth of understanding in data science and
learning in what ways data science connects with measurement chemistry, in order to use chemometrics
effectively.
The four volumes ofComprehensive Chemometrics, 2nd Ed.are the result of a meeting in Oxford in January,
2017, where the editors planned a revised work that would update most of the work covered in thefirst edition
and would cover emerging areas of chemometric research, while providing a sampling of current applications.
Our goal was to bring our reference work current with the advances in chemometrics that have occurred since
2006, with a treatment that would serve both the new and the experienced practitioner.
What has resulted from this collaboration is a resource that captures the practice of chemometrics now. The
four volumes in the revised work now include 119 chapters, with 33 new, 35 reprinted and 51 updated chapters,
making this the most wide-reaching and detailed overview of thefield of chemometrics ever published.
Comprehensive Chemometrics 2nd Ed.offers depth and rigor to the new practitioner entering thefield, and breadth
and varied perspectives on current literature to more experienced practitioners aiming to expand their horizons.
Software and datasets, both of which are especially valuable to those learning the methods, are provided in
some chapters. The coverage is not only comprehensive, it is authoritative as well; authors contributing to
Comprehensive Chemometrics 2nd Ed.are among the most distinguished practitioners of thefield.
Comprehensive Chemometrics 2nd Ed.would not have been possible without the considerable help of the
Editorial Board, who assisted in selecting authors and reviewing chapters. For this edition, our Board included
Richard Brereton, Marina Cocchi, Anna De Juan, Riccardo Leardi, Roger Phan Tan Lu, Federico Marini, William
Rayens, Luis Sarabia, Alejandro Olivieri and Rafael Cela.
This new edition would not have been possible without the hard work of the staff at Elsevier. We also owe
thanks to Rachel Conway, Senior Acquisitions Editor at Elsevier, for supporting the project and seeing the
project off, to Sean Simms, who took over the task of Acquisitions Editor in 2019, to Dhivya Karunagaran and
Brinda Subramanian for their help in ensuring that submissions met the requirements for publication, and
xix

especially to Paula Davies, our Content Project Manager, for overseeing the entire project, keeping track of the
due dates and submissions, encouraging authors as needed, and helping us to keep to the production schedule.
Finally, we extend special thanks to all of our authors whose efforts have made the work the valuable reference
that it is.
Steven Brown
Romà Tauler
Beata Walczak
March, 2020
xxPreface

CONTENTS OF ALL VOLUMES
Editors in Chief v
Section Editors vii
In memory of Roger Phan Tan Luu xi
Contributors to Volume 1 xiii
Subject Classification xv
Preface xix
VOLUME 1
1.01 Quality of Analytical Measurements: Statistical Methods for Internal Validation 1
M Cruz Ortiz, Luis A Sarabia, M Sagrario Sánchez, and Ana Herrero
1.02 Proficiency Testing in Analytical Chemistry 53
M Thompson and SLR Ellison
1.03 Quality of Analytical Measurements: Univariate Regression 71
MC Ortiz, MS Sánchez, and LA Sarabia
1.04 Robust Multivariate Statistical Methods 107
Mia Hubert
1.05 Bayesian Methodology in Statistics 123
JM Bernardo
1.06 Robust Methods for High-Dimensional Data 149
Mia Hubert
1.07 An Introduction to the Theory of Sampling: An Essential Part of Total Quality Management 173
FF Pitard
1.08 Representative Sampling, Data Quality, ValidationdA Necessary Trinity in Chemometrics 185
KH Esbensen and LP Julius
1.09 Introduction Experimental Designs 205
R Cela and R Phan-Tan-Luu
xxi

1.10 Screening Strategies 209
Rafael Cela Torrijos and Roger Phan-Tan-Luu
1.11 The Study of Experimental Factors 251
Rolf Carlson and Johan E Carlson
1.12 Response Surface Methodology 287
Luis A Sarabia, M Cruz Ortiz, and M Sagrario Sánchez
1.13 Experimental Design for Mixture Studies 327
D Voinovich, B Campisi, R Phan-Tan-Luu, and A Beal
1.14 Nonclassical Experimental Designs 385
Aurélie Beal and Roger Phan-Tan-Luu
1.15 Designing a Multi-Component Calibration Experiment: Basic Principles and Diagonal Approach 411
Andrey Bogomolov
1.16 The Essentials on Linear Regression, ANOVA, General Linear and Linear Mixed Models for the
Chemist 431
Bernadette Govaerts, Bernard Francq, Rebecca Marion, Manon Martin, and Michel Thiel
1.17 Multiset Data Analysis: ANOVA Simultaneous Component Analysis and Related Methods 465
HCJ Hoefsloot, DJ Vis, JA Westerhuis, AK Smilde, and JJ Jansen
1.18 Regularized Multivariate Analysis of Variance 479
Jasper Engel, Kas J Houthuijs, Vasilis Vasiliou, and Georgia Charkoftaki
1.19 ANOVA-Target Projection (ANOVA-TP) 495
Federico Marini and Beata Walczak
1.20 Constrained and Unconstrained Optimization 521
BW Bader
1.21 Sequential Optimization Methods 553
Bieke Dejaegher and Yvan Vander Heyden
1.22 Optimisation: Steepest Ascent, Steepest Descent and Gradient Methods 573
Richard G Brereton
1.23 Multicriteria Decision-Making Methods 585
M Pavan and R Todeschini
1.24 Genetic Algorithms in Chemistry 617
Riccardo Leardi
1.25 A Guided Tour of Penalties 635
Paul HC Eilers
1.26 Particle Swarm Optimization 649
Federico Marini and Beata Walczak
VOLUME 2
2.01 Introduction to Linear Soft-Modeling 1
Anna de Juan and Romà Tauler
xxiiContents of All Volumes

2.02 Principal Component Analysis: Concept, Geometrical Interpretation, Mathematical Background,
Algorithms, History, Practice 3
KH Esbensen and P Geladi
2.03 Principal Component Analysis 17
Paul Geladi and Johan Linderholm
2.04 Independent Component Analysis 39
F Westad and M Kermit
2.05 Independent Component Analysis in Analytical Chemistry 57
Hadi Parastar
2.06 Introduction to Multivariate Curve Resolution 85
Sarah C Rutan, Anna de Juan, and Romà Tauler
2.07 Two-Way Data Analysis: Evolving Factor Analysis 95
M Maeder and A de Juan
2.08 Two-Way Data Analysis: Detection of Purest Variables 107
Willem Windig, Andrey Bogomolov, and Sergey Kucheryavskiy
2.09 Two-Way Data Analysis: Multivariate Curve Resolution: Noniterative Resolution Methods 137
Zhimin Zhang, Pan Ma, and Hongmei Lu
2.10 Two-Way Data Analysis: Multivariate Curve Resolution, Iterative Methods 153
Anna de Juan, Sarah C Rutan, and Romà Tauler
2.11 Multivariate Curve ResolutiondError in Curve Resolution 173
Romà Tauler and Marcel Maeder
2.12 On the Ambiguity Underlying Multivariate Curve Resolution Methods 199
Mathias Sawall, Henning Schröder, Denise Meinhardt, and Klaus Neymeyr
2.13 Multiway Data Analysis: Eigenvector-Based Methods 233
J Ferré, R Boqué, and NM Faber
2.14 Multilinear Models, Iterative Methods 267
Giorgio Tomasi, Evrim Acar, and Rasmus Bro
2.15 Multiset Data Analysis: Extended Multivariate Curve Resolution 305
Romà Tauler, Marcel Maeder, and Anna de Juan
2.16 Tensor Similarity in Chemometrics 337
Frederik Van Eeghem and Lieven De Lathauwer
2.17 Bayesian Methods for Factor Analysis in Chemometrics 355
Eun Sug Park and Romà Tauler
2.18 Time Series Analysis Methods in Chemometrics 371
Steven D Brown
2.19 Other Topics in Soft-Modeling: Maximum Likelihood-Based Soft-Modeling Methods 399
PD Wentzell
2.20 Figures of Merit 441
Franco Allegrini and Alejandro C Olivieri
2.21 Unsupervised Data Mining: Introduction 465
D Coomans, C Smyth, I Lee, T Hancock, and J Yang
Contents of All Volumesxxiii

2.22 Data Mapping: Linear Methods versus Nonlinear Techniques 479
R Wehrens
2.23 Tree-Based Clustering and Extensions 491
T Hancock and C Smyth
2.24 Model-Based Clustering 509
GJ McLachlan, SI Rathnayake, and SX Lee
2.25 Common Clustering Algorithms 531
Ickjai Lee and Jianhua Yang
2.26 Density-Based Clustering Methods 565
M Daszykowski and B Walczak
2.27 Feature Selection: Introduction 581
BK Lavine
2.28 Feature Selection in the Wavelet Domain: Adaptive Wavelets 587
DA Donald, YL Everingham, LW McKinna, and D Coomans
2.29 Missing Data 615
F Arteaga, A Folch-Fortuny, and A Ferrer
2.30 Compositional Data Analysis in Chemometrics 641
Peter Filzmoser and Karel Hron
2.31 Sparse Methods 663
Ahmad Mani-Varnosfaderani
VOLUME 3
3.01 Pre-processing Methods 1
Jean-Michel Roger, Jean-Claude Boulet, Magida Zeaiter, and Douglas N Rutledge
3.02 Evaluation of Preprocessing Methods 77
H Jonsson and J Gabrielsson
3.03 Model-Based Pre-Processing in Vibrational Spectroscopy 83
Achim Kohler, Johanne Heitmann Solheim, Valeria Tafintseva, Boris Zimmermann, and Volha Shapaval
3.04 Normalization and Closure 101
M Bylesjö, O Cloarec, and M Rantalainen
3.05 Variable Shift and Alignment 115
Renger H Jellema, Abel Folch-Fortuny, and Margriet MWB Hendriks
3.06 Background Estimation, Denoising, and Preprocessing 137
J Trygg, J Gabrielsson, and T Lundstedt
3.07 Denoising and Signal-to-Noise Ratio Enhancement: Classical Filtering 143
DF Thekkudan and SC Rutan
3.08 Denoising and Signal-to-Noise Ratio Enhancement: Derivatives 157
V-M Taavitsainen
xxivContents of All Volumes

3.09 Denoising and Signal-to-Noise Ratio Enhancement: Splines 165
V-M Taavitsainen
3.10 Data Quality and Denoising: A Review 179
MS Reis, PM Saraiva, and BR Bakshi
3.11 Model Based Preprocessing and Background Elimination: OSC, OPLS, and O2PLS 205
M Bylesjö and M Rantalainen
3.12 Calibration Methodologies 213
John H Kalivas and Steven D Brown
3.13 Linear Regression Modeling: Variable Selection 249
Roberto Kawakami Harrop Galvão, Mário César Ugulino de Araújo, and
Sófacles Figueredo Carreiro Soares
3.14 An Elemental Perspective on Partial Least Squares 295
William S Rayens
3.15 Multivariate Approaches: UVE-PLS 309
V Centner
3.16 Data and Model Fusion in Chemometrics 317
Steven D Brown
3.17 Multi-Block and Three-Way Data Analysis 341
Mohamed Hanafi, El Mostafa Qannari, and Benoit Jaillais
3.18 Transfer of Multivariate Calibration Models 359
Steven D Brown
3.19 Robust Multivariate Methods in Chemometrics 393
Peter Filzmoser, Sven Serneels, Ricardo Maronna, and Christophe Croux
3.20 Regression Diagnostics 431
Joan Ferré Baldrich
3.21 Model-Based Data Fitting 477
M Maeder, N McCann, S Clifford, and G Puxty
3.22 Linear Approaches for Nonlinear Modeling 497
H Chen and BR Bakshi
3.23 Computationally Intensive Nonlinear Regression Methods 505
Bin Li, Bhavik R Bakshi, and Prem Goel
3.24 Non-linear Modeling: Neural Networks 519
Federico Marini
3.25 Feed-Forward Neural Networks 543
BK Lavine and TR Blank
3.26 Kernel Methods 555
J Suykens
3.27 Classification: Basic Concepts 567
BK Lavine and WS Rayens
3.28 Validation of Classifiers 575
BK Lavine
Contents of All Volumesxxv

3.29 Statistical Discriminant Analysis 585
BK Lavine and WS Rayens
3.30 Soft Independent Modeling by Class Analogy 605
Alexey L Pomerantsev and Oxana Ye Rodionova
3.31 Decision Tree Modeling 625
Steven D Brown and Anthony J Myles
3.32 Random Forest and Ensemble Methods 661
George Stavropoulos, Robert van Voorstenbosch, Frederik-Jan van Schooten, and Agnieszka Smolinska
3.33 Genetic Algorithms for Variable Selection and Pattern Recognition 673
Barry K Lavine, Collin G White, and Charles E Davidson
3.34 Multi Way Classification 701
Marina Cocchi, Mario Li Vigni, and Caterina Durante
3.35 Deep Learning Theoretical Chapter for Chemometrician 723
Robert van Vorstenbosch, Agnieszka Smolinska, and Lionel Blanchet
VOLUME 4
4.01 Chemometrics in Electrochemistry 1
M Esteban, C Ariño, and JM Díaz-Cruz
4.02 Chemometrics in the Pharmaceutical Industry 33
Benoît Igne, Christian Airiau, Sameer Talwar, and Elyse Towns
4.03 Environmental Chemometrics 69
Philip K Hopke
4.04 Resampling and Testing in Regression Models with Environmetrical Applications 87
J Roca-Pardiñas, C Cadarso-Suárez, and W González-Manteiga
4.05 Application of Chemometrics in the Food Sciences 99
Paolo Oliveri, Cristina Malegori, Eleonora Mustorgi, and Monica Casale
4.06 Chemometrics in Forensics 113
Marcelo M Sena, Werickson FC Rocha, Jez WB Braga, Carolina S Silva, and Aaron Urbas
4.07 Chemometric Analysis of Sensory Data 149
D Brynn Hibbert
4.08 Smart Sensors 193
Jordi Fonollosa
4.09 Statistical Control of Measures and Processes 215
AJ Ferrer-Riquelme
4.10 Best Practice and Performance of Hardware in Process Analytical Technology (PAT) 237
Rudolf W Kessler and Waltraud Kessler
4.11 Multivariate Statistical Process Control and Process Control, Using Latent Variables 275
T Kourti
xxviContents of All Volumes

4.12 Batch Process Modeling and MSPC 305
S Wold, N Kettaneh-Wold, JF MacGregor, and KG Dunn
4.13 Comprehensive Chemometrics 333
Shuxia Guo, Oleg Ryabchykov, Nairveen Ali, Rola Houhou, and Thomas Bocklitz
4.14 Chemometrics in NIR Hyperspectral Imaging: Theory and Applications in the Agricultural
Crops and Products Sector 361
Juan Antonio Fernández Pierna, Philippe Vermeulen, Damien Eylenbosch, James Burger,
Bernard Bodson, Pierre Dardenne, and Vincent Baeten
4.15 Mass Spectrometry Imaging: Chemometric Data Analysis 381
Joaquim Jaumot and Carmen Bedia
4.16 Fast Analysis, Processing and Modeling of Hyperspectral Videos: Challenges and
Possible Solutions 395
Raffaele Vitale, Petter Stefansson, Federico Marini, Cyril Ruckebusch, Ingunn Burud,
and Harald Martens
4.17 Image Processing in Chemometrics 411
Siewert Hugelier, Raffaele Vitale, and Cyril Ruckebusch
4.18 Chemometrics Analysis of Big Data 437
José Camacho and Edoardo Saccenti
4.19 Systems Biology 459
L Coulier, S Wopereis, C Rubingh, H Hendriks, M Radonjic, and RH Jellema
4.20 Analysis of Metabolomics DatadA Chemometrics Perspective 483
Julien Boccard and Serge Rudaz
4.21 Data Processing for RNA/DNA Sequencing 507
Inmaculada Fuertes, Maria Vila-Costa, Jana Asselman, Benjamín Piña, and Carlos Barata
4.22 Analysis of Megavariate Data in Functional Omics 515
EF Mosleth, A McLeod, I Rud, L Axelsson, LE Solberg, B Moen, KME Gilman, EM Færgestad,
A Lysenko, C Rawlings, SN Dankel, G Mellgren, F Barajas-Olmos, LS Orozco, S Sæbø, L Gidskehaug,
A Oust, A Kohler, H Martens, and KH Liland
4.23 Spectral Map Analysis of Microarray Data 569
L Bijnens, R Verbeeck, HW Göhlmann, W Talloen, RA Ion, PJ Lewi, and L Wouters
4.24 Chemometrics in Flow Cytometry 585
Gerjen H Tinnevelt and Jeroen J Jansen
4.25 Chemometrics for QSAR Modeling 599
Roberto Todeschini, Viviana Consonni, Davide Ballabio, and Francesca Grisoni
4.26 Chemoinformatics 635
J Polanski
4.27 High-Performance GRID Computing in Chemoinformatics 677
N Sim, D Konovalov, and D Coomans
Index 703
Contents of All Volumesxxvii

This page intentionally left blank

1.01Quality of Analytical Measurements: Statistical Methods for
Internal Validation
q
M Cruz Ortiz,Departamento de Química, Facultad de Ciencias, Universidad de Burgos, Burgos, Spain
Luis A Sarabia and M Sagrario Sa´nchez,Departamento de Matemáticas y Computación, Facultad de Ciencias, Universidad de
Burgos, Burgos, Spain
Ana Herrero,Departamento de Química, Facultad de Ciencias, Universidad de Burgos, Burgos, Spain
© 2020 Elsevier B.V. All rights reserved.
This is an update of M.C. Ortiz, L.A. Sarabia, M.S. Sánchez, A. Herrero, 1.02 - Quality of Analytical Measurements: Statistical Methods for Internal
Validation, in Comprehensive Chemometrics, edited by Steven D. Brown, Romá Tauler, Beata Walczak, Elsevier, 2009, https://doi.org/10.1016/B978-
044452701-1.00090-9.
1.01.1 Introduction 3
1.01.2 Con fidence and Tolerance Intervals 7
1.01.2.1 Con fidence Interval 8
1.01.2.2 Con fidence Interval on the Mean of a Normal Distribution 9
1.01.2.2.1 Case 1: Known variance 9
1.01.2.2.2 Case 2: Unknown variance 10
1.01.2.3 Con fidence Interval on the Variance of a Normal Distribution 10
1.01.2.4 Con fidence Interval on the Difference in Two Means 11
1.01.2.4.1 Case 1: Known variances 11
1.01.2.4.2 Case 2: Unknown variances 11
1.01.2.4.3 Case 3: Con fidence interval for paired samples 12
1.01.2.5 Con fidence Interval on the Ratio of Variances of Two Normal Distributions 12
1.01.2.6 Con fidence Interval on the Median 13
1.01.2.7 Joint Con fidence Intervals 13
1.01.2.8 Tolerance Intervals 13
1.01.2.8.1 Case 1: b-content tolerance interval 13
1.01.2.8.2 Case 2: b-expectation tolerance interval 14
1.01.2.8.3 Case 3: Distribution free intervals 14
1.01.3 Hypothesis Tests 15
1.01.3.1 Elements of a Hypothesis Test 15
1.01.3.2 Hypothesis Test on the Mean of a Normal Distribution 19
1.01.3.2.1 Case 1: Known variance 19
1.01.3.2.2 Case 2: Unknown variance 19
1.01.3.2.3 Case 3: The paired t-test 19
1.01.3.3 Hypothesis Test on the Variance of a Normal Distribution 20
1.01.3.4 Hypothesis Test on the Difference in Two Means 20
1.01.3.4.1 Case 1: Known variances 20
1.01.3.4.2 Case 2: Unknown variances 21
1.01.3.5 Test Based on Intervals 22
1.01.3.6 Hypothesis Test on the Variances of Two Normal Distributions 23
1.01.3.7 Hypothesis Test on the Comparison of Several Independent Variances 24
1.01.3.7.1 Case 1: Cochran’s test 24
1.01.3.7.2 Case 2: Bartlett’s test 25
1.01.3.7.3 Case 3: Levene’s test 25
1.01.3.8 Goodness-of-Fit Tests: Normality Tests 26
1.01.3.8.1 Case 1: Chi-square test 26
1.01.3.8.2 Case 2: D ’Agostino normality test 27
1.01.4 One-Way Analysis of Variance 28
1.01.4.1 The Fixed Effects Model 28
1.01.4.2 Power of the Fixed Effects ANOVA model 30
1.01.4.3 Uncertainty and Testing of the Estimated Parameters in the Fixed Effects Model 31
1.01.4.3.1 Case 1: Orthogonal contrasts 32
1.01.4.3.2 Case 2: Comparison of several means 32
1.01.4.4 The Random Effects Model 33
q
Change History:October 2019. M. Cruz Ortiz, Luis A. Sarabia, M. Sagrario Sánchez, Ana Herrero added MATLAB live-scripts for the computations; re-written
introduction to tolerance intervals; corrected estimates in Table 13; updated texts; corrected mistakes and updated references.
Comprehensive Chemometrics, 2nd edition, Volume 1 https://doi.org/10.1016/B978-0-12-409547-2.14746-8 1

1.01.4.5 Power of the Random Effects ANOVA model 35
1.01.4.6 Con fidence Intervals for the Estimated Parameters in the Random Effects Model 35
1.01.5 Statistical Inference and Validation 35
1.01.5.1 Trueness 35
1.01.5.2 Precision 36
1.01.5.3 Statistical Aspects of the Experiments to Determine Precision 39
1.01.5.4 Consistency Analysis and Incompatibility of Data 39
1.01.5.4.1 Case 1: Elimination of data 39
1.01.5.4.2 Case 2: Robust methods 41
1.01.5.5 Accuracy 43
1.01.5.6 Ruggedness 43
1.01.6 Appendix 45
1.01.6.1 Some Basic Elements of Statistics 45
1.01.6.2 The Normal Distribution 46
1.01.6.3 Student ’stDistribution 46
1.01.6.4 The c
2
(Chi-square) Distribution 47
1.01.6.5 The FDistribution 48
1.01.6.6 Convergence of Random Variables 48
1.01.6.7 Some Computational Aspects 48
1.01.6.7.1 Normal distribution 49
1.01.6.7.2 Student ’s t distribution withndegrees of freedom 49
1.01.6.7.3 c
2
distribution withndegrees of freedom 49
1.01.6.7.4 F
n
1
,n2distribution withn 1andn 2degrees of freedom 49
1.01.6.7.5 Power for the z-test, Eq. 50
1.01.6.7.6 Power for the t-test, Eq. 50
1.01.6.7.7 Power for the chi-square test, Eq. 50
1.01.6.7.8 Power for the F-test, Eq. 50
1.01.6.7.9 Power for fixed effects ANOVA, Eq. 50
1.01.6.7.10 Power for random effects ANOVA, Eq. 50
References 50
Nomenclature
1LaConfidence level
1LbPower
CCaLimit of decision
CCbCapability of detection
F
n
1
,n2Fdistribution withn 1andn 2degrees of freedom (d.f.)
H
0Null hypothesis
H
1Alternative hypothesis
N(m,s)Normal distribution with meanmand standard deviations
NID(m,s)(Normally and Independently Distributed) independent random variables equally distributed as normal with mean
mand standard deviations
sSample standard deviation
s
2
Sample variance
t
nStudent’stdistribution withndegrees of freedom (d.f.)
fixSample mean
V(X)Variance of the random variableX
aSignificance level, probability of type I error
bProbability of type II error
DBias (systematic error)
3Random error
mMean
nDegree(s) of freedom, d.f.
sStandard deviation
s
2
Variance
s
RReproducibility (as standard deviation)
s
rRepeatability (as standard deviation)
c
n
2
c
2
(chi-square) distribution withndegrees of freedom
2Quality of Analytical Measurements: Statistical Methods for Internal Validation

1.01.1 Introduction
Every day millions of analytical determinations are made in thousands of laboratories all around the world. These measurements
are necessary for assessment of merchandise in the commercial interchanges, supporting health care, maintaining security, for
quality control of water and environment, for characterization of raw materials and manufactured products, and for forensic anal-
yses. Practically, any aspect of the contemporary social activity is somehow supported in the analytical measurements. The cost of
these measurements is high, but the cost of the decisions made based on incorrect results is much greater. For example, a test that
wrongly shows the presence of a forbidden substance in a food destined for human consumption can result in an expensive claim,
the confirmation of the presence of an abuse drug can lead to serious judicial sentences or doping in the sport practice may result in
severe sanctions. The importance of providing a correct result is evident but it is equally important to be able to prove that the result
is correct.
Once an analytical problem is posed to a laboratory and the analytical method is selected, the next step is the in-house validation
of the method. This is the process of defining the analytical requirements to respond to the problem and to confirm that the consid-
ered method has performance characteristics consistent with those required. The results of the validation experiments must be eval-
uated in order to ensure that the method meets the measurement required specification.
The set of operations to determine the value of an amount (measurand) suitably defined is called the measurement. The method
of measurement is the sequence of operations that is used when conducting the measurements. It is documented with enough
details so that the measurement may be done without additional information.
Once a method is designed or selected, it is necessary to evaluate its performance characteristics and to identify the factors that
can change these characteristics and to what extent they can change. If, in addition, the method is developed to solve a particular
analytical problem, it is necessary to verify that the method isfit for purpose.
1
This process of evaluation is called validation of the
method. It implies the determination of several parameters that characterize the method performance: decision limit, capability of
detection, selectivity, specificity, ruggedness, and accuracy (trueness and precision). In any case, they are the measurements them-
selves which allow evaluation of the performance characteristics of the method and itsfit for purpose. In addition, when using the
method, the obtained measurements are also the ones that will be used to make decisions on the analyzed sample, for example,
whether the amount of an analyte fulfills a legal specification. Therefore, it is necessary to suitably model the data that a method
provides. In what follows we will consider that the data provided by the analytical method are real numbers; other possibilities exist,
for example, the count of bacteria or impacts in a detector take only (discrete) natural values, or also, sometimes, the data resulting
from an analysis are qualitative, for example, the identification of an analyte through its m/z ratios in a mass spectrometry-
chromatography analysis.
With regard to the analytical measurement, it is admitted that the value,x, provided by the method of analysis consists of three
terms, the true value of the parameterm, a systematic error (bias)
D, and a random error3with zero mean, in an additive way as
expressed in Eq.(1):
x¼mþ Dþ3 (1)
All the possible measurements that a method can provide when analyzing a sample constitute the population of the measure-
ments. This is indeed a theoretical situation because it is being assumed that there are infinitely many samples and that the method
of analysis remains unalterable. In these conditions, the model of the analytical method, Eq.(1), is mathematically a random vari-
able,X, with mathematical expectationmþ
Dand variance equal to the variance of3; in statistical notation,E(X)¼mþ Dand
V(X)¼V(
3), respectively.
A random variable, and thus the analytical method, is described by its cumulative distribution functionF
X(x), that is, the prob-
ability that the method provides measurements less than or equal toxfor any valuex. Symbolically, this is written asF
X(x)¼pr
{Xx} for any real valuex. In most of the applications, it is assumed thatF
X(x) is differentiable, which implies, among other
things, that the probability of obtaining exactly a specific value is zero. In the case of a differentiable cumulative distribution func-
tion, the derivative ofF
X(x) is the probability density function (pdf)f X(x). Any functionf(x) such that it is positive,f(x)fl0, and the
area under the function is 1,!
Rf(x)dx¼1, is the pdf of a random variable. The probability that the random variableXtakes values in
the interval [a,b] is the area under the pdf over the interval [a,b], that is,
prfX˛a;b?g?
Z
b
a
fxðÞdx (2)
and the mean and variance ofXare written as in Eqs.(3), (4), respectively
EXðÞ¼
Z
R
xf xðÞdx (3)
VXðÞ¼
Z
R
ðxEXðÞÞ
2
fxðÞdx (4)
In general, mean and variance do not characterize in a unique way a random variable and therefore neither the method of anal-
ysis.Fig. 1shows the pdf of four random variables with the same mean 6.00 and standard deviation 0.61.
Quality of Analytical Measurements: Statistical Methods for Internal Validation3

These four distributions, uniform or rectangular (Fig. 1A), triangular (Fig. 1B), normal (Fig. 1C), and Weibull (Fig. 1D), are
frequent in the scope of analytical determinations, and they appear in Appendix E of the EURACHEM/CITAC Guide
1
and also
they are used in metrology.
2
If the only available information regarding a quantityXis the lower limit,l, and the upper limit,u, but the quantity could be
anywhere in between, with no idea of whether any part of the range is more likely, then a rectangular distribution in the interval [l,u]
would be assigned toX.This is so because it is the pdf that maximizes the“information entropy”of Shannon, in other words the pdf
that adequately characterizes the incomplete knowledge aboutX. Frequently, in reference material, the certified concentration is
expressed in terms of a number and unqualified limits (e.g., 10002mgL
1
). In this case, a rectangular distribution should be
used (Fig. 1A).
When the available information concerningXincludes the knowledge that values close toc(betweenlandu) are more likely
than those near the bounds, the adequate distribution is a triangular one (Fig. 1B), with the maximum of its pdf inc.
If a good location estimate,m, and a scale estimate,s, are the only information available regardingX, then, according to the prin-
ciple of maximum entropy, a normal probability distributionN(m,s)(Fig. 1C) would be assigned toX(remember thatmandsmay
have been obtained from repeated applications of a measurement method).
Finally, the Weibull distribution (Fig. 1D) is very versatile; it can mimic the behavior of other distributions such as the normal or
exponential. It is adequate for the analysis of reliability of processes, and in chemical analysis it is useful in describing the behavior
of thefigures of merit of a long-term procedure. For example, the distribution of the capability of detection CCb
3
is a Weibull one or
the distribution of the determinations of ammonia in water by UV-vis spectroscopy during 350 different days in Aldama.
4
In the four cases given inFig. 1, the probability of obtaining values between 5 and 7 has been computed with Eq.(2). For the
uniform distribution (Fig. 1A) this probability is 0.94, whereas for the triangular distribution (Fig. 1B) is 0.88, for the normal distri-
bution (Fig. 1C) is 0.90, and for the Weibull distribution (Fig. 1D), 0.93. Sorting in decreasing order of the proportion of values that
each distribution accumulates in the interval [5.0, 7.0] we have uniform, Weibull, normal, and triangular although the triangular
and normal distributions tend to give values symmetrically around the mean and the Weibull distribution does not. If another
interval is considered, say [5.4, 6.6], the distributions accumulate probabilities of 0.57, 0.64, 0.67, and 0.54, respectively, in which
45 6 7 8
0
0.2
0.4
0.6
0.8
1.0
1.2
45 6 7 8
0
0.2
0.4
0.6
0.8
1.0
1.2
45 6 7 8
0
0.2
0.4
0.6
0.8
1.0
1.2
45 6 7 8
0
0.2
0.4
0.6
0.8
1.0
1.2
(A) (B)
(C) (D)
Fig. 1Probability density functions of four random variables with mean 6 and variance 0.375. (A) Uniform in [4.94, 7.06]; (B) Symmetric triangular
in [4.5, 7.5]; (C) Normal N(6, 0.61); (D) Weibull with shape 1.103 and scale 0.7 shifted to give a mean of 6. Dotted vertical lines mark the interval
[5.0, 7.0].
4Quality of Analytical Measurements: Statistical Methods for Internal Validation

the difference among values is larger than before and, in addition, sorted the distributions as normal, triangular, uniform, and
Weibull.
If for each of those variables, valuebis determined so that there is afixed probability,p, of obtaining values belowb(i.e., the
valuebsuch thatp¼pr{X<b} for each distributionX), the results ofTable 1are obtained. For example, second row, 5% of the
times the uniform distribution at hand gives values less thanb¼5.05, less than 4.97 if it is the triangular distribution, and so on. In
the table, the extreme values among the four distributions for each probabilityphave been identified, and large differences are
observed caused by the form in which the values far from 6 are distributed (notice the differences inFig. 1for the normal, the trian-
gular, or the uniform distribution) and also due to the asymmetry of the Weibull distribution.
Therefore, the mean and variance of a random variable give very limited information on the values provided by the random
variable, unless additional information is at hand about the form of its density (pdf). For example, if one knows that the distribu-
tion is uniform or symmetrical triangular or normal, the random variable is completely characterized by its mean and variance.
In practice, the pdf of a method of analysis is unknown. We only have afinite number,n, of measurements, which are the
outcomes obtained when applying repeatedly (ntimes) the same method to the same sample. Thesenmeasurements constitute
a statistical sample of the random variableXdefined by the method of analysis.
Fig. 2shows histograms of 100 results obtained when applying four methods of analysis, named A, B, C, and D, to aliquot parts
of a sample to determine an analyte. Clearly, the four methods behave differently.
From the experimental data, the (sample) mean and variance are computed as
fix¼
P
n
i¼1
xi
n
(5)
s
2
¼
P
n
i¼1
ðxifixÞ
2
n1
(6)
fixands
2
are estimates of the mean and variance of the distribution ofX. These estimates with the data inFig. 2are shown inTable 2.
According to the model of Eq.(1),EXðÞ¼mþ
Dxfix, that is, the sample mean estimates the true valuemplus the bias D.
Assuming that the true value ism¼6 and subtracting it from the sample means in thefirst row ofTable 2, the bias estimated
for methods A and B would be 0.66 and 0.16 for methods C and D. The bias of a method is one of its performance characteristics
and must be evaluated during the validation of the method. In fact, technical guides, for example, the one by the International Orga-
nization for Standardization (ISO), state that, for a method, better trueness means less bias. To estimate the bias, it is necessary to
have samples with known concentrationm(e.g., certified material, spiked samples).
The value of the variance is independent of the true content,m, of the sample. For this reason, to estimate the variance, it is only
necessary to have replicated measurements on aliquot parts of the same sample. The second row ofTable 2shows that methods B
and C have the same variance, 1.26, which is 5 times greater than the one of methods A and D, 0.25. The dispersion of the data
obtained with a method is the precision of the method and constitutes another performance characteristic to be determined in
the validation of the method. In agreement with model in Eq.(1), a measure of the dispersion is the varianceV(X), which is esti-
mated by means ofs
2
.
In some occasions, for evaluating trueness and precision, it is more descriptive to use statistics other than mean and variance. For
example, when the distribution is rather asymmetric, as inFig. 1D, it is more reasonable to use the median than the mean. The
median is the value in which the distribution accumulates 50% of the probability, 5.83 for the pdf inFig. 1D and 6.00 for the other
three distributions, which are symmetric around their mean. In practice, it is frequent to see the presence of anomalous data
(outliers) that influence the mean and above all the variance, which is improperly increased; in these cases, it is advisable to use
robust estimates of central tendency and spread (dispersion).
5–7
Details can be found in the chapter of the present book devoted
to robust procedures.
Fig. 2andTable 2show that the two characteristics of a measurement method, trueness and precision, are independent to one
another, in the sense that a method with better trueness (less bias), methods C and D, can be more, case D, or less, case C, precise.
Analogously, methods A and B have an appreciable bias but A is more precise than B. A method is said to be accurate when it is
precise and fulfills trueness.
Table 1Values ofbsuch thatp¼pr{X<b} whereXis each one of the random variables defined in the caption ofFig. 1.
P
Random variable
Uniform Triangular Normal Weibull
0.01 4.96 4.71 4.58
n
5.34
m
0.05 5.05 4.97
n
5.00 5.37
m
0.50 6.00
m
6.00
m
6.00
m
5.83
n
0.95 6.95
n
7.03 7.01 7.22
m
0.99 7.04
n
7.29 7.42 8.12
m
n, Minimumbamong the four distributions; m, Maximumbamong the four distributions.
Quality of Analytical Measurements: Statistical Methods for Internal Validation5

Histograms are estimates of the pdf and allow evaluation of the performance of each method in a more detailed way than when
only considering trueness and precision. For example, the probability of obtaining values in any interval can be estimated with the
histogram. The third row inTable 2shows the frequencies for the interval [5.0, 7.0]. Method D (best trueness and precision among
the four) provides 98% of the values in the interval, whereas method B (worst trueness and precision) provides only 56% of the
values in the interval. Nonetheless, trueness and precision should be jointly considered. See how according to data inTable 2
the effect of increasing the precision, using method A instead of B when the bias is“high”is an increase of 14% of results of the
measurement method to be in the interval [5.0, 7.0], whereas when the bias is small (C and D), there is an increase of 40%.
This behavior should be taken into account when optimizing a method and also in the ruggedness analysis, which is another perfor-
mance characteristic to be validated according to most of the guides. As can be seen in the fourth row ofTable 2, if the method that
provides more results below 6 is needed, C would be the method selected.
345678910
0
10
20
30
40
(A) (B)
(C) (D)
345678910
0
10
20
30
40
345678910
0
10
20
30
40
345678910
0
10
20
30
40
Fig. 2Frequency histograms of 100 measures obtained with four different analytical methods, named (A), (B), (C), and (D), on aliquot parts of
a sample.Dotted vertical linesmark the interval [5.0, 7.0].
Table 2Some characteristics of the distributions inFig. 2.
Method
ABCD
Mean,x 6.66 6.66 6.16 6.16
Variance,s
2
0.25 1.26 1.26 0.25
fr{5<X<7} 0.70 0.56 0.58 0.98
fr{X<6} 0.08 0.29 0.49 0.39
pr{5<N(x,s)<7} 0.75 0.55 0.62 0.94
pr{N(x,s)<6} 0.09 0.28 0.44 0.37
fr, frequencies;pr, probabilities.
6Quality of Analytical Measurements: Statistical Methods for Internal Validation

The previous explanations show the usefulness of knowing the pdf of the results of a method of analysis. As in practice we have
only a limited number of results, two basic strategies are possible to estimate the pdf: (1) to assess that the experimental data are
compatible with a known distribution (e.g., normal) and then use the corresponding pdf; (2) to estimate the pdf by a data-driven
technique based on a computer-intensive method such as the kernel method
8
or by using other methods such as adaptive or penal-
ized likelihood.
9,10
The data ofFig. 2can be adequately modeled by a normal distribution, according to normality hypothesis tests
whose details are explained later in Section“Goodness-of-Fit Tests: Normality Tests”. Thefitted normal distributions are used to
compute the probabilities of obtaining values in the interval [5.0, 7.0] or less than 6, last two rows inTable 2. When comparing
these values with those computed with the empirical histograms (compare rows 3 and 5, and rows 4 and 6), there are no appreciable
differences and the normal pdf can be used instead.
In the validation of an analytical method and during its later use, statistical methodological strategies are needed to make deci-
sions from the available experimental data. The knowledge of these strategies supposes a way of thinking and acting that, subor-
dinated to the chemical knowledge, makes it objective both the analytical results and their comparison with those of other
researchers and/or other analytical methods.
Ultimately, a good method of analysis is a serious attempt to come close to the true value of the measurement, always unknown.
For this reason, the result of a measurement has to be accompanied by an evaluation of uncertainty or its degree of reliability. This is
done by means of a confidence interval. When the requirement is to establish the quality of an analytical method, then its capability
of detection, precision, etc. must be compared with those corresponding to other methods. This is formalized with a hypothesis test.
Confidence intervals and test of hypothesis are the basic tools in the validation of analytical methods.
In this introduction, the word sample has been used with two different meanings. Usually, there is no confusion because the
context allows one to distinguish whether it is a sample in the statistical or chemical sense.
In Chemistry, according to the International Union for Pure and Applied Chemistry (IUPAC) (Page 50 in Section 18.3.2 of
Inczédy et al.
11
),“sample”should be used only when it is a part of a selected material of a great amount of material. This meaning
coincides with that of a statistical sample and implies the existence of sampling error, that is, error caused by the fact that the sample
can be more or less representative of the amount in the material. For example, suppose that we want to measure the amount of
pesticide that remains in the ground of an arable land after a certain time. We take several samples“representative”of the ground
of the parcel (statistical sampling) and this introduces an uncertainty in the results characterized by a variance (theoretical)s
s
2
.
Afterward, the quantity of pesticide in each chemical sample is determined by an analytical method, which has its own uncertainty,
characterized bys
m
2
, in such a way that the uncertainty in the quantity of pesticide in the parcel iss s
2
þsm
2
provided that the
method gives results independent of the location of the sample. Sometimes, when evaluating whether a method is adequate for
a task, the sampling error can be an important part of the uncertainty in the result and, of course, should be taken into account
to plan the experimentation.
When the sampling error is negligible, for example, when a portion is taken from a homogeneous solution, the IUPAC recom-
mends using words such as test portion, aliquot, or specimen.
In summary, there is a clear link between measurement method and a random variable which is why the probability is the
natural form of expressing experimental uncertainty. This is thus the focus of the present article that is organized as follows:
Section“Confidence and Tolerance Intervals”describes confidence intervals to measure bias and precision under the normality
hypothesis and tolerance intervals, useful in evaluating thefit for purpose of a method. Also, a nonparametric interval on the
median is described.
Section“Hypothesis Test”is devoted to making decisions based on experimental data that, as such, are affected by uncertainty. In
this section, the computation of the power of a test is systematically proposed as a key element to evaluate the quality of the decision
at the desired significance level. A brief incursion into tests based on intervals is also made as they solve the problem of deciding
whether an interval of values is acceptable, for example, a relative error less than 10% in absolute value. The section ends with some
goodness-of-fit tests to evaluate the compatibility of a theoretical probability distribution with some experimental data.
Section“One-Way Analysis of Variance”is dedicated to the analysis of variance (ANOVA) for bothfixed and random effects, and
in Section“Statistical Inference and Validation”some more specific questions related to the usual parameters of the analytical
method validation and their relation with the developed statistical methodologies are analyzed.
Mathematical proofs are not covered in this article and, to be operative from a practical point of view, several examples have
been included so that the reader can verify the understanding of the formulas and the argumentation for their thoughtful use.
This aspect is completed with the inclusion of anAppendixwhere some essential aspects related to the effectiveness of the statistical
models and the limits laws are described. TheAppendixalso contains the necessary sentences, in MATLAB code, to repeat all the
calculations proposed along the article. The same sentences are also available as supplementary material in the form of MATLAB
.mlx live scripts(at least release R2016a is needed to read and execute them).
1.01.2 Confidence and Tolerance Intervals
There are some important questions when evaluating a method, for example,“in a given sample, what is the maximum value that it
provides?”that, due to the random character of the results, cannot be answered with just a number.
In order to include the degree of certainty in the answer, the question should be reformulated as: What is the maximum value,U,
that will be obtained 95% of the times that the method is used in the sample? The answer to the question thus posed would be
Quality of Analytical Measurements: Statistical Methods for Internal Validation7

a tolerance interval, and to build it the probability distribution must be known. For instance, let us suppose that it is aN(m,s) and
we denote byz
0.05the critical value of aN(0,1)¼Zdistribution, the one that accumulates probability 0.95. Then, a possible answer
isU¼mþz
0.05sbecause then the probability that the analytical method gives values greater thanUispr{method>U}¼pr
{N(m,s)>mþz
0.05s}, which, according to the result inAppendix, is equal topr{Z>z 0.05}¼0.05. In general, for any percentage
of results 100(1a)%, the maximum value provided by the method would be
U¼mþz as (7)
with a probabilityathat the aforementioned assertion is false.
If, instead, the interest was in the valueLso that the 100(1a)% of the results are greater thanL, then the answer would be
L¼mz as (8)
Finally, the interval [L,U] that contains 100(1a)% of the values obtained with the method would be
L;U??
h
mz
a=2s;mþz
a=2s
i
(9)
An analytical example where one of these tolerance intervals with a normal distributionN(m,s) needs to be computed would be:
An analytical method gives values (mgL
1
) that follow aN(9, 0.5) distribution when measuring a standard with 9mgL
1
. To assess
whether the method is still properly working, ten standards are included in the daily sequence of determinations. The probability
distribution of the mean of these ten values is aN

9;0:5=
ffiffiffiffiffiffi
10
p
. Following Eq.(9),the tolerance interval at 95% level is 91:96
0:5=
ffiffiffiffiffiffi
10
p
¼90.31mgL
1
. Consequently, if one day a mean of, say, 9.5mgL
1
is obtained, the method does not work properly
because 9.5 does not belong to the tolerance interval and the method should be revised, at the risk of doing this revision uselessly
5% of the times. Notice that the tolerance interval is always the same, built at the desired confidence level 100(1a)% with the
distributionN

9;0:5=
ffiffiffiffiffiffi
10
p
and it is not updated daily with the new samples.
Different to Eq.(9), two variants of tolerance intervals, namely theb-content and theb-expectation tolerance intervals, are
explained in Section“Tolerance Intervals”due to their relevance in the context of validation of analytical methods. In any case,
any of them is completely different from the confidence intervals introduced and developed in the following sections (from Section
“Confidence Interval”to Section“Joint Confidence Intervals”).
After explaining all the studied cases, the sectionfinishes with a comparative analysis of both concepts (tolerance and confidence
intervals).
1.01.2.1 Confidence Interval
We have already remarked that estimation of solely the mean,fix, and variance,s
2
, fromnindependent results provides very limited
information on the method performance. The objective now is to make affirmations of the type“in the sample, the amount of the
analytem, estimated byfix, is betweenLandU(m˛[L,U])”with a certain probability that the statement is true. Following this partic-
ular example, we should consider thatfixis a value taken by the random variablefiX(sample mean) and use its distribution to answer
the new question. Its distribution function is obtained mathematically from the one ofX,F
X(x), and thus depends on the informa-
tion we have aboutF
X(x) (e.g., if the variance is known or should be also estimated, etc.).
Equal
Unequal
For mean 
0
Known
variance
Unknown
variance
For difference
in means 
1

2
Known
variances
Unknown
variances
For standard
deviation fl
0
For ratio of standard
deviations fl
1
/fl
2
CONFIDENCE INTERVALS UNDER NORMAL 
DISTRIBUTION(S)
ONE SAMPLE TWO INDEPENDENT SAMPLES
Fig. 3Diagram summarizing the different cases for computing confidence intervals.
8Quality of Analytical Measurements: Statistical Methods for Internal Validation

In the general case, with a random variableX, obtaining a confidence interval forXfrom a samplex 1,x2,.,x nconsists of obtain-
ing two functionsl(x
1,x2,.,x n) andu(x 1,x2,.,x n) such that
prfX˛½lug ?prflXug¼1a (10)
1ais the confidence level andais the significance level, meaning that the statement that the value ofXis betweenlanduwill be
false 100a% of the times.
In the next sections this idea will be particularized for some different cases, according to the random variableXof interest.Fig. 3
is a diagram that summarizes the cases studied in the following sections. All the examples are written in MATLAB live-scriptfile
Intervals_section1022_live.mlx, in the supplementary material, so that they can be easily repeated or adapted for the reader’s
own data.
1.01.2.2 Confidence Interval on the Mean of a Normal Distribution
1.01.2.2.1 Case 1: Known variance
Suppose that we have a random variable that follows a normal distribution with known variance. This will be the case, for example,
of using an already validated method of analysis. The assumption means that we know that
3in Eq.(1)is normally distributed and
also its variance. If we are using samples of sizenand taking into account the properties of the normal distribution (seeAppendix),
the sample mean,fiX, is a random variableNm;s=
ffiffiffi
n
p

; thus, the particular expression of Eq.(10)for this random variable is
pr

mz
a
=
2
s
ffiffiffi
n
pfiXmþz
a
=
2
s
ffiffiffi
n
p

¼1a (11)
that is, 100(1a)% of the values of the sample mean are in the interval in Eq.(11). A simple algebraic manipulation (subtractm
andfiX, multiply by1) gives
pr

fiXz
a
=
2
s
ffiffiffi
n
pmfiXþz
a
=
2
s
ffiffiffi
n
p

¼1a (12)
Therefore, according to Eq.(10), the confidence interval on the mean that is obtained from Eq.(12)is

fiXz
a
=
2
s
ffiffiffi
n
p;fiXþz
a
=
2
s
ffiffiffi
n
p

(13)
Analogously, the confidence intervals at confidence level 100(1a)% for the maximum and minimum values of the mean are
computed from Eqs.(14), (15), respectively
pr

mfiXþz a
s
ffiffiffi
n
p

¼1a (14)
pr

fiXz
a
s
ffiffiffi
n
pm

¼1a (15)
and, thus, the corresponding one-sided intervals would be

N;fiXþz a
sffiffi
n
p

and

fiXz
a
sffiffi
n
p
;N

.
In an experimental context, when measuringnaliquot parts of a test sample, we obtainnvaluesx
1,x2,.,x n. Their sample meanfix
is the particular value taken by the random variablefiXand is also an estimate of the true valuem.
Example 1: Suppose that an analytical method follows aN(m,4) and we have a sample of size 10 with values 98.87, 92.54, 99.42,
105.66, 98.70, 97.23, 98.44, 103.73, 94.45 and 101.08. With this sample, the mean is 99.01 and using Eq.(13), the interval at
95% confidence level is

99:011:96
4=ffiffiffiffiffiffi
10
p
;99:01þ1:96
4=ffiffiffiffiffiffi
10
p

¼[96.53, 101.49].
For the interpretation of this interval, notice that with different samples of size 10 (same analytical method), different intervals
will be obtained at the same 95% confidence level. The endpoints of these intervals are nonrandom values, and the unknown mean
value, which is also a specific value, will or will not belong to the interval. Therefore, the affirmation“the interval contains the
mean”is a deterministic assertion that is true or false for each of the intervals. What one knows is that it is true for
100(1a)% of those intervals. In our case, as 95% of the constructed intervals will contain the true value, we say, at 95% confi-
dence level, that the interval [96.53, 101.49] containsm.
This is the interpretation with the frequentist approach adopted in this article, that is to say that the information on random
variables is obtained by means of samples of them and that the parameters to be estimated are not known but arefixed amounts
(e.g., the amount of analyte in a sample,m, is estimated by the measurement results obtained by analyzing itntimes). With
a Bayesian approach to the problem, a probability distribution is attributed to the amount of analytemand oncefixed an interval
of interest [a,b], the“a priori”distribution ofm, the experimental results, and the Bayes’theorem are used to calculate the probability
Quality of Analytical Measurements: Statistical Methods for Internal Validation9

a posteriorithatmbelongs to the interval [a,b]. It is shown that, although in most practical cases the uncertainty intervals obtained
from repeated measurements using either theory may be similar, their interpretation is completely different. The works by Lira and
Wöger
12
and Zech
13
are devoted to compare both approaches from the point of view of the experimental data and their uncertainty.
Also, an introduction to Bayesian methods for analyzing chemical data can be seen in Armstrong and Hibbert.
14,15
1.01.2.2.2 Case 2: Unknown variance
Suppose a normally distributed random variable with unknown variance that must be estimated, together with the mean, fromn
experimental data. The confidence interval is computed as inCase 1, but now the random variablefiXfollows (seeAppendix)
a Student’stdistribution withn1 degrees of freedom (d.f.); thus, the interval at the 100(1a)% confidence level is obtained
from
pr

fiXt
a=2;n
s
ffiffiffi
n
pmfiXþt
a=2;n
s
ffiffiffi
n
p

¼1a (16)
wheret a/2,nis the upper percentage point (100a
=
2%) of the Studenttdistribution withn¼n1 d.f. andsis the sample standard
deviation. Analogously, the one-sided intervals at the 100(1a)% confidence level come from
pr

mfiXþt a;n
s
ffiffiffi
n
p

¼1a (17)
pr

fiXt
a;n
s
ffiffiffi
n
pm

¼1a (18)
Example 2: Suppose that the probability distribution of an analytical method is a normal, but its standard deviation is unknown.
With the data ofExample 1, the sample standard deviation,s, is computed as 3.90. Ast
0.025,9¼2.262 (seeAppendix), the confi-
dence interval at 95% level is [99.012.261.24, 99.01þ2.261.24]¼[96.21, 101.81]. The 95% confidence interval on the
minimum of the mean (i.e., the 95.0% lower confidence bound) is made up, according to Eq.(18), by all the values greater than
96.74¼99.011.831.24. The corresponding interval on the maximum (upper confidence bound for mean), Eq.(17), will be
made up by the values less than 101.28¼99.01þ1.831.24.
The length of the confidence intervals from Eqs.(12)–(15)is a function of the sample size and tends towards zero when the
sample size tends to infinity. This functional relation permits the computation of the sample size needed to obtain an interval
of given length,d. It will suffice to consider
d
2
¼z
a=2
sffiffi
n
p
and take asnthe nearest integer greater than

2za=2s
d

2
. For example, if
we want a 95% confidence interval with lengthdless than 2, in the hypothesis ofExample 1, we will need a sample size greater
than or equal to 62.
The same argument can be applied when the standard deviation is unknown. However, in this case, to computenby
2ta=2;ns
d

2
it is necessary to have an initial estimation ofs, which, in general, is obtained in a pilot study with sizen
0
, in such a way that in the
previous expression the d.f.,n, aren
0
1. An alternative is to define the desired length of the interval in standard deviation units
(remember that the standard deviation is unknown). For instance, inExample 2, if we wantd¼0.5s, we will need a sample size
greater than (4z
a/2)
2
¼61.5; note the substitution oft a/2,nbyza/2, which is mandatory because we do not have the sample size
needed to computet
a/2,n, which is precisely what we want to estimate.
1.01.2.3 Confidence Interval on the Variance of a Normal Distribution
In this case, the data come from aN(m,s) distribution withmandsunknown, and we have a sample with valuesx 1,x2,.,x n. The
distribution of the random variable“sample variance”S
2
is related to the chi-square distribution,c
2
(seeAppendix). As a conse-
quence, the 100(1a)% confidence interval for the variances
2
is obtained from
pr
(
ðn1ÞS
2
c
2
a=2;n
s
2

ðn1ÞS
2
c
2
1a=2;n
)
¼1a (19)
wherec
2
a/2,nis the critical value of ac
2
distribution withn¼n1 d.f. at significance levela/2. As in the previous case for the
sample mean, we should distinguish between the random variable sample varianceS
2
and one of its values,s
2
, computed with Eq.
(6)from samplex
1,x2,.,x n.
The intervals for the maximum and minimum of the variance at 100(1a)% confidence level are obtained from Eqs.(20),
(21), respectively. pr
(
s
2

ðn1ÞS
2
c
2
1a;n
)
¼1a (20)
10Quality of Analytical Measurements: Statistical Methods for Internal Validation

pr
(
ðn1ÞS
2
c
2
a;n
s
2
)
¼1a (21)
Example 3: Knowing that then¼10 data ofExample 2come from a normal distribution with both mean and variance unknown, the
95% confidence interval ons
2
is found from Eq.(19)as [7.21, 50.81] becauses
2
¼15.25,c
2
0.025, 9¼19.02, andc
2
0.975, 9¼2.70. If
the analyst is interested in obtaining a confidence interval for the maximum variance, the 95% upper confidence interval is found
from Eq.(20)as [0, 41.27] becausec
2
0.95, 9¼3.33, that is, the upper bound for the variance is 41.27 with 95% confidence. Notice
the lower bound in 0. To obtain confidence intervals on the standard deviation, it suffices to take the square root of the aforemen-
tioned intervals because this operation is a monotonically increasing transformation; therefore, the intervals at 95% confidence
level on the standard deviation are [2.69, 7.13] and [0, 6.42], respectively.
The sample size,n, needed so thats
2
/s
2
is between 1kand 1þkis given by the nearest integer greater than 1þ
1=
2
h
z
a=2

ffiffiffiffiffiffiffiffiffiffiffi
1þk
p
þ1
.
k
i
2
. For example, fork¼0.5, such that the length of the confidence interval verifies 0.5<s
2
/s
2
<1.5,
we would needn¼40 data (at least). Just for comparative purposes, we will admit in the example that with the sample of size
40 we obtain the same variances
2
¼15.25. Asc 0.025, 39
2
¼58.12, andc 0.975, 39
2
¼23.65, the two-sided interval at 95% confidence
level is now [10.23, 25.15], which verifies the required specifications.
1.01.2.4 Confidence Interval on the Difference in Two Means
1.01.2.4.1 Case 1: Known variances
Consider two independent random variables,N 1andN 2, distributed asN(m 1,s1) andN(m 2,s2) with unknown means and known
variancess
1
2
ands 2
2
. We wish tofind a 100(1a)% confidence interval on the difference in meansm 1m2. With a random
sample ofn
1observations from thefirst distribution,x 11,x12,.,x 1n
1
, andn 2observations from the second one,x 21,x22,.,
x
2n
2
, the 100(1a)% confidence interval onm 1m2is obtained from the equation
pr
8
<
:
ðfiX 1fiX2?z
a=2
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
s
2
1
n1
þ
s
2 2
n2
s
m
1m
2?fiX 1fiX2Þþz
a=2
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
s
2
1
n1
þ
s
2 2
n2
s 9
=
;
¼1a (22)
wherefiX 1andfiX 2are the random variables of the sample mean, which take the valuesfix 1andfix 2. The reader can easily write the
expressions analogous to Eqs.(14), (15)for the one-sided intervals.
1.01.2.4.2 Case 2: Unknown variances
The approach to this topic is similar to the previous case, but here even the variancess
2
1ands
2
2are unknown. However, it can be
reasonable to assume that they are equal,s
2
1¼s
2
2¼s
2
, and that the differences observed in their estimates with the samples,s 1
2
ands 2
2
, are not significant. The methodology to decide whether this can be assumed, or not, is explained later, in Section“Hypoth-
esis Test”.
An estimate of the common variances
2
is given by the pooled sample variance in Eq.(23)which is an arithmetic average of both
variances weighted by the corresponding d.f.,
s
2
p
¼
ðn11Þs
2
1
þðn21Þs
2
2
n1þn22
(23)
The 100(1a)% confidence interval is obtained from the following equation:
pr

fiX 1fiX2t
a=2;nsp
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
n
1
þ
1
n
2
r
m
1m
2fiX1fiX2þt
a=2;nsp
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
n
1
þ
1
n
2
r
¼1a (24)
wheren¼n 1þn22 are the d.f. of the Student’stdistribution. The one-sided intervals at 100(1a)% confidence level have the
analogous expressions deduced from Eq.(24)by substitutingt
a/2,nforta,n.Ifafixed length is desired for the confidence interval, the
computation explained in Section“Confidence Interval on the Mean of a Normal Distribution”can be immediately adapted to
obtain the needed sample size.
Example 4: We want to study the stability of a substance after being stored for a month. Here stability means that the content of the
substance remains unchanged. Two series of measurements (n
1¼n2¼8) were carried out before and after the storage period and
we will estimate the difference in means by a 95% confidence interval. The results werefix
1¼90:8;s
2
1
¼3:89 andfix 1¼92:7;s
2
2
¼
4:02, respectively. Therefore, the two-sided interval when assuming equal variances (s
p
2
¼3.96, Eq.(23))is
ð90:892:7?2:1448
ffiffiffiffiffiffiffiffiffiffi
3:96
p
ffiffiffiffiffiffiffiffiffiffi
1
8
þ
1 8
q
, that is [4.03, 0.23]. Therefore, at 95% confidence level, the difference of the means
belongs to the interval, including null difference, that is, the substance is stable.
Quality of Analytical Measurements: Statistical Methods for Internal Validation11

When the assumptions
2
1¼s
2
2is not reasonable, we can still obtain an interval on the differencem 1m2by using the fact that
the statistic
fiX
1fiX2?m
1m

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
s
2
1
=n1þs
2
2
=n2
p is distributed approximately as atwith d.f. given by,


s
2
1
=n1þs
2
2
=n2

2ðs
2
1
=n1Þ
2
n11
þ
ðs
2
2
=n2Þ
2
n21
(25)
The 100(1a)% confidence interval is obtained from the following equation:
pr
8
<
:
fiX 1fiX2t
a=2;n
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
s
2
1
n1
þ
s
2 2
n2
s
m
1m
2fiX1fiX2þt
a=2;n
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
s
2
1
n1
þ
s
2 2
n2
s 9
=
;
¼1a (26)
Example 5: We want to compute a confidence interval on the difference of two means with unknown and not equal variances, with
the results that come from an experiment carried out with four aliquot samples by two different analysts. Thefirst analyst obtains
fix
1¼3:285, and the secondfix 2¼3:257. The variances weres 1
2
¼3.3310
5
ands 2
2
¼9.1710
5
, respectively. Assuming that
s
1
2
ss2
2
, Eq.(25)givesn¼4.9, so the d.f. to apply Eq.(26)are 5 andt 0.025,5¼2.571. Thus, the 95% confidence interval is
ð3:2853:257?2:571
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
3:3310
5
4
þ
9:1710
5
4
q
, that is, [0.014, 0.042]. So, at 95% of confidence, the two analysts provide unequal
measurements because zero is not in the interval.
The confidence intervals for the maximum and the minimum are obtained by considering the last or thefirst term respectively in
Eq.(26)and replacingt
a/2,nbyta,n.
1.01.2.4.3 Case 3: Confidence interval for paired samples
Sometimes we are interested in evaluating an effect (e.g., the reduction of a polluting agent in an industrial spill by means of a cata-
lyst) but it is impossible to have two homogeneous populations of samples without and with treatment to obtain the two means of
the recoveries, because the amount of polluting agent may change, for example, over time. In these cases, the solution is to deter-
mine the polluting agent before and after applying the procedure to the same spill. The difference between both determinations is
a measure of the effect of the catalyst. The (statistical) samples obtained in this way are known as paired samples. Formally, with the
two paired samples of sizen,x
11,x12,.,x 1nandx 21,x22,.,x 2n, we compute the differences between any pair of data,d i¼x1ix2i,
i¼1,2,.,n. If these differences follow a normal distribution, the 100(1a)% confidence interval is obtained from
pr


dt
a=2;n
s
d
ffiffiffi
n
pm

dþt
a=2;n
s
d
ffiffiffi
n
p

¼1a (27)
where

dands dare the mean and standard deviation of the differencesd iandn¼n1 are the d.f. of thetdistribution.
1.01.2.5 Confidence Interval on the Ratio of Variances of Two Normal Distributions
This section approaches the question of giving a confidence interval on the ratios
2
1/s
2
2of the variances of two distributionsN 1N
(m
1,s1) andN 2N(m 2,s2) with unknown means and variances. Letx 11,x12,.,x 1n
1
be a random sample ofn 1observations fromN 1
andx 21,x22,.,x 2n
2
be a random sample ofn 2observations fromN 2. The sample variances obtained with these two samples,s 1
2
and
s
2
2
, are the particular values of the random variablesS 1
2
andS 2
2
, and the 100(1a)% confidence interval on the ratio of variances is
computed from the following equation:
pr

F
1a=2;n 1;n2
S
2
1
S
2
2

s
2 1
s
2 2
F
a=2;n 1;n2
S
2
1
S
2
2

¼1a (28)
whereF 1a/2,n
1
,n2andF a/2,n
1
,n2are the critical values (upper tail) of anFdistribution withn 1¼n21 d.f. in the numerator and
n
2¼n11 d.f. in the denominator. TheAppendixcontains a description of some relevant properties of theFdistribution.
We can also compute one-sided confidence intervals. The 100(1a)% upper or lower confidence bound ons
1
2
/s2
2
is obtained
from Eqs.(29), (30), respectively. Remember that, when computing the intervals by using Eq. (29), the lower bound is always 0.
pr

s
2
1
s
2
2
Fa;n1;n2
S
2
1
S
2
2

¼1a (29)
pr

F
1a;n 1;n2
S
2
1
S
2
2

s
2
1
s
2
2

¼1a (30)
Example 6: In this example, we compute a two-sided 95% confidence interval for the ratio of the variances inExample 4(n 1¼n2¼8,
s
1
2
¼3.89,s 2
2
¼4.02). The resulting interval is [0.20(3.89/4.02), 4.99(3.89/4.02)]¼[0.19, 4.83]. As 1 belongs to this
interval, we can admit that both variances are equal.
12Quality of Analytical Measurements: Statistical Methods for Internal Validation

1.01.2.6 Confidence Interval on the Median
This case is different from the previous cases, because the confidence interval is a“distribution-free”interval, that is, there is no
distribution assumed for the data. As it is known, a percentile (pct) is the valuex
pctsuch that 100pct% of the values are less
than or equal tox
pct. It is possible to compute confidence intervals on any pct, but for values of pct near one or zero we need
very large sample sizes,n, because the valuesnpct andn(1pct) must be greater than 5. For the median (pct¼0.5), it suffices
to consider samples of size 10 or more.
The fundamentals of these confidence intervals are based on the binomial distribution whose details are outside the scope of this
article and can be found in Sprent.
16
We use the data ofExample 1to show step by step how a 100(1a)% confidence interval on
the median is computed (the guided example is fora¼0.05 withz
a/2¼1.96). The procedure consists of three steps:
1. To sort the data in ascending order. In our case, 92.54, 94.45, 97.23, 98.44, 98.70, 98.87, 99.42, 101.08, 103.73, and 105.66. The
rank of each datum is the position that it occupies in the sorted list, for example, the rank of 98.44 is four.
2. To calculate the rank,r
l, of the value that will be the lower endpoint of the interval. It is the nearest integer less than
1
2

nz
a=2
ffiffiffi
n
p
þ1

. In our case, this value is 0:5

101:96
ffiffiffiffiffiffi
10
p
þ1

¼2:40, thusr
l¼2.
3. To calculate the rank,r
u, of the value that will be the upper endpoint of the interval, which is the nearest integer greater than
1
2

nþz
a=2
ffiffiffi
n
p
1

. In our case, this value is 0:5

10þ1:96
ffiffiffiffiffiffi
10
p
1

¼7:60, thenr
u¼8.
Hence, the 95% confidence interval on the median is made by the values that are between the values in position 2 and 8, that is,
[94.45, 101.08].
1.01.2.7 Joint Confidence Intervals
Sometimes it is necessary to compute confidence intervals for several parameters but maintaining a 100(1a)% confidence that all
of them contain the true value of the corresponding parameter. For example, for two parameters statistically independent, we can
assure a 100(1a)% joint confidence level by taking separately the corresponding 100(1a)
1/2
% confidence intervals because
(1a)
1/2
(1a)
1/2
¼(1a). In general, if there arekparameters, we will compute the 100(1a)
1/k
% confidence interval
for any of them.
However, if the used sample statistics are not independent to one another, the above computation is not valid. The Bonferroni
inequality states that the probability that all the affirmations are true at 100(1a)% confidence level is greater than or equal to
1(
P
k
i¼1ai), where 1a iis the confidence level of thei-th interval (usuallya i¼a/k). For example, if a joint 90% confidence
interval is needed for the mean of two distributions, according to Bonferroni inequalitya
i¼a/2¼0.10/2¼0.05; thus, each indi-
vidual interval should be the corresponding 95% confidence interval.
1.01.2.8 Tolerance Intervals
In the introduction to present Section“Confidence and Tolerance Intervals”, the tolerance intervals of a normal distribution have
been calculated knowing its mean and variance. Remember that the tolerance interval [l,u] contains 100(1a)% of the values of
the distribution ofXor, equivalently,pr{X;[l,u]}¼a. Actually, the values of the parameters that define the probability distri-
bution are unknown and this uncertainty should be transferred into the endpoints of the interval. There are several types of toler-
ance regions, but in this article, we will restrict ourselves to two common cases.
1.01.2.8.1 Case 1:b-content tolerance interval
Given a random variableX, an interval [l,u]isab-content tolerance interval atgconfidence level if the following holds:
prfprfX˛l;u?gflbgflg (31)
Expressed in words, [l,u] contains at least 100b% of the values ofXwithgconfidence level. For the case of an analytical method,
this is to say that we have to determine, based on a sample of sizen, for instance, the interval that will contain 95% (b¼0.95) of the
results and this assertion must be true 90% of the times (g¼0.90). Evidently,b-content tolerance intervals can be one-sided, which
means that the procedure will provide 95% of its results abovel(respectively, belowu) 90% of the times. We leave to the reader the
corresponding formal definitions.
One-sided and two-sidedb-content tolerance intervals can be computed either by controlling the center or by controlling the
tails, and for both continuous and discrete random variables (a review can be seen in Patel
17
and applications in Analytical Chem-
istry in Meléndez et al.
18
and Reguera et al.
19
).
Here we will only describe the case of a normally distributedXwith unknown mean and variance. From this distribution, we
have a sample of sizenthat is used to compute the meanfixand standard deviations. We want to obtain a two-sidedb-content toler-
ance interval controlling the center, that is, an interval such that
prfprfX˛½fixks;fixþksg bgflg (32)
To determinek, several approximations have been reported; consult Patel
17
for a discussion on them. The approach by Wald and
Wolfowitz
20
is based on determiningk 1such that
Quality of Analytical Measurements: Statistical Methods for Internal Validation13

pr

Nð0;1?
1
ffiffiffi
n
pþk
1

pr

Nð0;1?
1
ffiffiffi
n
pk
1

¼b (33)
Therefore
k¼k 1
ffiffiffiffiffiffiffiffiffiffiffiffiffi
n1
c
2
g;n1
s
(34)
cg,n1
2
is the point exceeded with probabilitygwhen using thec
2
distribution withn1 d.f.
Example 7: With the data inExample 1, andb¼g¼0.95, we havefix¼99:01,s¼3.91,k 1¼2.054, andc 0.95, 9
2
¼3.33; thus, accord-
ing to Eq.(34),k¼3.379 and, as a consequence, the interval [99.013.383.91, 99.01þ3.383.91]¼[85.79, 112.23]
contains 95% of the results of the method 95% of the times that the procedure is repeated with a sample of size 10.
1.01.2.8.2 Case 2:b-expectation tolerance interval
The interval [l,u] is called ab-expectation tolerance interval if
EðprfX˛l;u½ gÞ ¼b (35)
Unlike theb-content tolerance interval, condition in Eq.(35)only demands that, on average, the probability that the random
variable takes values betweenlanduisb.
As in the previous case, we limit ourselves to obtain intervals of the form [fixks,fixþks]. When the distribution of the random
variable is normal and we have a sample of sizen, the solution was obtained for thefirst time by Wilks
21
and is
k¼t1b
2
;n
ffiffiffiffiffiffiffiffiffiffiffiffi
nþ1
n
r
(36)
wheret (1b)/2,n is the upper (1b)/2 point of thetdistribution withn¼n1 d.f.
Example 7 (continuation): With the same data, the 95% expectation tolerance interval would be [99.012.373.91, 99.01þ
2.373.91]¼[89.74, 108.28] as nowkis directly computed with the critical valuet
0.025, 9¼2.262.
This interval is shorter than theb-content tolerance interval because it only assures the expected value (the mean) of the prob-
abilities that the individual values belong to the interval. In fact, the interval [89.74, 108.28] contains 95% of the values ofXonly
64% of the times, conclusion drawn by applying Eq.(32)withk¼2.37. Also, note that when the sample size tends to infinity, the
value ofkin Eq.(36)tends towardsz
(1b)/2 which is the length of the theoretical interval that, in our example, would be [91.35,
106.67] obtained by substitutingkbyz
0.025¼1.96.
1.01.2.8.3 Case 3: Distribution free intervals
It is also possible to obtain tolerance intervals independent of the distribution (provided it is continuous) of variableX. These inter-
vals are based on the rank of the observations, but they demand very large sample sizes, which makes them quite useless in practice.
For example, the sample sizenneeded to guarantee that theb-content tolerance interval [l,u]is[x
(1),x(n)] (i.e., the endpoints are the
smallest and the greatest values in the sample), it is necessary thatnfulfills approximately the equation log(n)þ(n1) log (g)¼
log (1b)log (1g).
22
If we need, as inExample 7,b¼g¼0.95, the value ofnhas to be 89. Nevertheless, Willinks
23
used the
Monte Carlo method to compute shorter“distribution-free”uncertainty intervals proposed in Draft Supplement
2
but it still requires
sample sizes that are rather large in the scope of chemical analysis. A complete theoretical development on tolerance intervals
(including their estimation by means of Bayesian methods) is in the book by Guttman.
24
The tolerance intervals are of interest to show that a method isfit for purpose because when establishing that the interval [fixks,
fixþks] will contain, on average, 100b% of the values provided by the method (or 100b% of the values withgconfidence level), we
are including precision and trueness. To assess that the method is“fit for purpose”it suffices that the tolerance interval [fixks,
fixþks] is included in the specifications that the method should fulfill. Note that a method with high precision (small value ofs)
but with a significant bias can get to fulfill the specifications in the sense that a high proportion of its values are within the spec-
ifications. In addition, in the estimation ofs, the repeatability can be introduced as the intermediate precision or the reproducibility
to consider the scope of application of the method. The use of a tolerance interval solves the problem of introducing the bias as
a component of the uncertainty.
With the aim of developing analyticalfit for purpose methods, the Societé Française des Sciences et Techniques Pharmaceutiques
(SFSTP) proposed
25–28
the use ofb-expectation tolerance intervals in the validation of quantitative methods. In four case studies, it
has shown the validity ofb-expectation tolerance intervals as an adequate way to conciliate both the objectives of the analytical
method in routine analysis and those of the validation step, and it proposes them
29
as a criterion to select the calibration curve.
Also, it has analyzed
30
their adequacy to the guides that establish the performance criteria that should be validated and their useful-
ness
31
in the problem of the transference of an analytical method. González and Herrador
32
have proposed their computation for
the estimation of uncertainty of the analytical assay. In all these cases,b-expectation tolerance intervals based on the normality of
data are used, that is, using Eq.(36). To avoid dependence on the underlying distribution and the use of the classic distribution-free
14Quality of Analytical Measurements: Statistical Methods for Internal Validation

methods, Rebafka et al.
33
proposed the use of a bootstrap technique to calculateb-expectation tolerance intervals, whereas Fernholz
and Gillespie
34
studied the estimation of theb-content tolerance intervals by using bootstrap.
To summarize this whole section about tolerance and confidence intervals, it is worth pointing out some comparative aspects
because there is a tendency to confuse both concepts that have nothing in common but the word interval. The difference between
them is clear: the confidence interval is the set that is supposed to contain (with a 100(1a)% confidence) the true value of the
unknown parameter; the tolerance interval is the set that contains a value which is taken by the random variable in a percentage ofb,
with a given confidenceg.
In particular, confidence intervals must be used in the process of evaluating trueness and precision of a method when there is no
need to fulfill external requirements but just to compare with other methods or to quantify uncertainty and bias of the results ob-
tained with it.
A usual error is to mistakenly consider a confidence interval as a tolerance interval when the difference between them is impor-
tant. For instance, with the data ofExample 7, notice that to compute the confidence interval, the standard deviation of the mean is
estimated ass=
ffiffiffi
n
p
¼1:24, whereas the standard deviation of the individual results of the method is estimated ass¼3.91, very
different.
Also, it is important to remember that when the sample sizentends to infinity, the length of a confidence interval tends toward
zero, independently of the chosen confidence level. For example, with the confidence intervals for the mean, in the limit we will
havefix¼mthus the estimator and the true parameter will be equal for sure (1a¼1). On the contrary, the length of ab-content
tolerance interval does not tend towards zero when increasing the sample size but to the interval that contains for sure (1g¼1)
the 100b% of the values.
There are other aspects of the determination of the uncertainty that are of practical interest, for example, the problem that arises
by the fact that any uncertainty interval, particularly an expanded uncertainty interval, should be restricted to the range of feasible
values of the measurand. Cowen and Ellison
35
analyzed how to modify the interval when the data are close to a natural limit in
a feasible range such as 0 or 100% mass or mole fraction.
1.01.3 Hypothesis Tests
This section is devoted to the introduction of a statistical methodology to decide whether an affirmation is false, for example, the
affirmation“this method of analysis applied to this sample of reference provides the certified value”. If, on the basis of the exper-
imental results, it is decided that it is false, we will conclude that the method has bias. The affirmation is customarily called hypoth-
esis and the procedure of decision making is called hypothesis testing. A statistical hypothesis is an asseveration on the probability
distribution that follows a random variable. Sometimes one has to decide on a parameter, for example, whether the mean of
a normal distribution is a specific value. In other occasions it may be required to decide on other characteristics of the distribution,
for example, whether the experimental data are compatible with the hypothesis that they come from a normal or uniform
distribution.
1.01.3.1 Elements of a Hypothesis Test
As the results obtained with analytical methods are modeled by a probability distribution, it is evident that both the validation of
a method and its routine use involve making decisions that are naturally formulated as problems of hypothesis testing. In order to
describe the elements of a hypothesis test, we will use a concrete case. Like in the case of intervals, all the examples can be followed
with live-script in the supplementary material entitledTests_section1023_live.mlx.
Example 8: For an experimental procedure, we need solutions with pH values less than 2. The preparation of these solutions provides
pH values that follow a normal distribution withs¼0.55. pH values obtained from 10 measurements were 2.09, 1.53, 1.70, 1.65,
2.00, 1.68, 1.52, 1.71, 1.62, and 1.58. The question to be answered is whether the pH of the resulting solution is adequate to
proceed with the experiment.
We express this formally as
H0:m¼2:00ðinadequate solutionÞ
H
1:m<2:00ðvalid solutionÞ
(37)
The statement“m¼2.00”in Eq.(37)is called the null hypothesis, denoted as H 0, and the statement“m<2.00”is called the
alternative hypothesis, H
1. As the alternative hypothesis specifies values ofmthat are less than 2.00 it is called one-sided alternative.
In some situations, we may wish to formulate a two-sided alternative hypothesis to specify values ofmthat could be either greater or
less than 2.00 as in
H0:m¼2:00
H
1:ms2:00
(38)
Quality of Analytical Measurements: Statistical Methods for Internal Validation15

The hypotheses are not affirmations about the sample but about the distribution from which those values come, that is to say,m
is the value, unknown, of the pH of the solution that will be the same as the value provided by the procedure if the bias is zero (see
the model of Eq.(1)). In general, to test a hypothesis, the analyst must consider the experimental goal and define, accordingly, the
null hypothesis for the test, as in Eq.(37). Hypothesis-testing procedures rely on using the information in a random sample; if this
information is inconsistent with the null hypothesis, we would conclude that the hypothesis is false. If there is not enough evidence
to prove falseness, the test defaults to the decision of not rejecting the null hypothesis though this does not actually prove that it is
correct. It is therefore critical to choose carefully the null hypothesis in each problem.
In practice, to test a hypothesis, we must take a random sample, compute an appropriate test statistic from the sample data, and
then use the information contained in this statistic to make a decision. However, as the decision is based on a random sample, it is
subject to error. Two kinds of potential errors may be made when testing hypothesis. If the null hypothesis is rejected when it is true,
then a type I error has been made. A type II error occurs when the researcher accepts the null hypothesis when it is false. The situation
is described inTable 3.
InExample 8, if the experimental data lead to rejection of the null hypothesis H
0being true, our (wrong) conclusion is that the
pH of the solution is less than 2. A type I error has been made and the analyst will use the solution in the procedure when in fact it is
not chemically valid. If, on the contrary, the experimental data lead to acceptance of the null hypothesis when it is false, the analyst
will not use the solution when in fact the pH is less than 2 and a type II error has been made. Note that both types of error have to be
considered because their consequences are very different. In the case of type I error, an unsuitable solution is accepted, the procedure
will be inadequate, and the analytical result will be wrong with the subsequent damages that it may cause (e.g., the loss of a client, or
a mistaken environmental diagnosis). On the contrary, the type II error implies that a valid solution is not used with the correspond-
ing extra cost of the analysis. It is clear that the analyst has to specify the assumable risk of making these errors, and this is done in
terms of the probability that they will occur.
The probabilities of occurrence of type I and II errors are denoted by specific symbols, defined in Eq.(39). The probabilityaof
the test is called the significance level, and the power of the test is 1b, which measures the probability of correctly rejecting the
null hypothesis.
a¼prftype I errorg¼prfreject H 0jH0is trueg
b¼prftype II errorg¼prfaccept H
0jH0is falseg
(39)
In Eq.(39), the symbol“|”indicates that the probability is calculated under that condition. In the example we are following,
awill be calculated with the normal distribution of mean 2 and standard deviation 0.55.
Statistically expressed, with then¼10 results inExample 8(sample meanfix¼1:708), one wants to decide about the value of
the mean of a normal distribution with known variance and one-sided alternative hypothesis (a one-tail test).
With these premises, the related statistic is written inTable 4(second row) and givesZ
calc¼ðfixm
0Þ=ðs=
ffiffiffi
n
p
Þ¼ð1:7082:0Þ=

0:55=
ffiffiffiffiffiffi
10
p
?1:679.
In addition, the analyst must assume the riska, say 0.05. This means that the decision rule that is going to apply to the exper-
imental results will accept an inadequate (chemical) solution 5% of the times. Therefore, the critical or rejection region is written in
Table 4, second row, as CR¼{Z
calc<1.645}, meaning that the null hypothesis will be rejected for the samples of size 10 that
provide values of the statistic less than1.645. In the example, the actual valueZ
calc?1.679 belongs to the critical region; thus,
the decision is to reject the null hypothesis at 5% significance level.
Given the present facilities of computation, instead of the CR, the available statistical software calculates the so-calledP-value,
which is the probability of obtaining the current value of the statistic under the null hypothesis H
0. In our case,P-value¼pr
{Z1.679}¼0.0466. When theP-value is less than the significance levela, the null hypothesis is rejected because this is the
same as saying that the value of the statistic belongs to the critical region.
The next question that immediately arises is about the power of the applied decision rule (statistic and critical region). To calcu-
lateb,defined in Eq.(39), it is necessary to exactly specify the meaning of the alternative hypothesis. In our case, what is meant by
pH smaller than 2. From a mathematical point of view, the answer is clear: any number less than 2, for example, 1.9999 which
clearly does not make sense from the point of view of the analyst. In this context, sometimes due to previous knowledge, in other
cases because of the regulatory stipulations or simply by the detail of the standardized work procedure, the analyst can decide the
value of pH that is considered to be less than 2.00, for example, a pH less than 1.60. This is the same as assuming that“pH equal to
2”is any smaller value whose distance to 2 is less than 0.40. In these conditions,
b¼pr

Nð0;1Þ<z a
djj
s
ffiffiffi
n
p

(40)
Table 3Decisions in hypothesis testing.
Researcher’s decision
The unknown truth
H
0is true H 0is false
Accept H
0 No error Type II error
Reject H
0 Type I error No error
16Quality of Analytical Measurements: Statistical Methods for Internal Validation

Another Random Scribd Document
with Unrelated Content

"What have I done?" Lanfierre asked in the monotone of shock.
Fownes took the wheel. It was off a 1995 Studebaker.
"I'm not sure what's going to come of this," he said to Lanfierre with
an astonishing amount of objectivity, "but the entire dome air supply
is now coming through my bedroom."
The wind screamed.
"Is there something I can turn?" Lanfierre asked.
"Not any more there isn't."

They started down the stairs carefully, but the wind caught them
and they quickly reached the bottom in a wet heap.
Recruiting Lieutenant MacBride from behind his sofa, the men
carefully edged out of the house and forced the front door shut.
The wind died. The fog dispersed. They stood dripping in the
Optimum Dome Conditions of the bright avenue.
"I never figured on this," Lanfierre said, shaking his head.
With the front door closed the wind quickly built up inside the house.
They could see the furnishing whirl past the windows. The house did
a wild, elated jig.
"What kind of a place is this?" MacBride said, his courage beginning
to return. He took out his notebook but it was a soggy mess. He
tossed it away.
"Sure, he was different," Lanfierre murmured. "I knew that much."
When the roof blew off they weren't really surprised. With a certain
amount of equanimity they watched it lift off almost gracefully,
standing on end for a moment before toppling to the ground. It was
strangely slow motion, as was the black twirling cloud that now rose
out of the master bedroom, spewing shorts and socks and cases
every which way.
"Now what?" MacBride said, thoroughly exasperated, as this strange
black cloud began to accelerate, whirling about like some malevolent
top....
Humphrey Fownes took out the dust jacket he'd found in the library.
He held it up and carefully compared the spinning cloud in his
bedroom with the illustration. The cloud rose and spun, assuming
the identical shape of the illustration.
"It's a twister," he said softly. "A Kansas twister!"

"What," MacBride asked, his bravado slipping away again, "what ...
is a twister?"
The twister roared and moved out of the bedroom, out over the rear
of the house toward the side of the dome. "It says here," Fownes
shouted over the roaring, "that Dorothy traveled from Kansas to Oz
in a twister and that ... and that Oz is a wonderful and mysterious
land beyond the confines of everyday living."
MacBride's eyes and mouth were great zeros.
"Is there something I can turn?" Lanfierre asked.
Huge chunks of glass began to fall around them.
"Fownes!" MacBride shouted. "This is a direct order! Make it go
back!"
But Fownes had already begun to run on toward the next house,
dodging mountainous puffs of glass as he went. "Mrs. Deshazaway!"
he shouted. "Yoo-hoo, Mrs. Deshazaway!"
The dome weevils were going berserk trying to keep up with the
precipitation. They whirred back and forth at frightful speed, then,
emptied of molten glass, rushed to the Trough which they quickly
emptied and then rushed about empty-handed. "Yoo-hoo!" he yelled,
running. The artificial sun vanished behind the mushrooming twister.
Optimum temperature collapsed. "Mrs. Deshazaway! Agnes, will you
marry me? Yoo-hoo!"
Lanfierre and Lieutenant MacBride leaned against their car and
waited, dazed.
There was quite a large fall of glass.

*** END OF THE PROJECT GUTENBERG EBOOK A FALL OF GLASS
***
Updated editions will replace the previous one—the old editions will
be renamed.
Creating the works from print editions not protected by U.S.
copyright law means that no one owns a United States copyright in
these works, so the Foundation (and you!) can copy and distribute it
in the United States without permission and without paying
copyright royalties. Special rules, set forth in the General Terms of
Use part of this license, apply to copying and distributing Project
Gutenberg™ electronic works to protect the PROJECT GUTENBERG™
concept and trademark. Project Gutenberg is a registered trademark,
and may not be used if you charge for an eBook, except by following
the terms of the trademark license, including paying royalties for use
of the Project Gutenberg trademark. If you do not charge anything
for copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such as
creation of derivative works, reports, performances and research.
Project Gutenberg eBooks may be modified and printed and given
away—you may do practically ANYTHING in the United States with
eBooks not protected by U.S. copyright law. Redistribution is subject
to the trademark license, especially commercial redistribution.
START: FULL LICENSE

THE FULL PROJECT GUTENBERG LICENSE

PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK
To protect the Project Gutenberg™ mission of promoting the free
distribution of electronic works, by using or distributing this work (or
any other work associated in any way with the phrase “Project
Gutenberg”), you agree to comply with all the terms of the Full
Project Gutenberg™ License available with this file or online at
www.gutenberg.org/license.
Section 1. General Terms of Use and
Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand, agree
to and accept all the terms of this license and intellectual property
(trademark/copyright) agreement. If you do not agree to abide by all
the terms of this agreement, you must cease using and return or
destroy all copies of Project Gutenberg™ electronic works in your
possession. If you paid a fee for obtaining a copy of or access to a
Project Gutenberg™ electronic work and you do not agree to be
bound by the terms of this agreement, you may obtain a refund
from the person or entity to whom you paid the fee as set forth in
paragraph 1.E.8.
1.B. “Project Gutenberg” is a registered trademark. It may only be
used on or associated in any way with an electronic work by people
who agree to be bound by the terms of this agreement. There are a
few things that you can do with most Project Gutenberg™ electronic
works even without complying with the full terms of this agreement.
See paragraph 1.C below. There are a lot of things you can do with
Project Gutenberg™ electronic works if you follow the terms of this
agreement and help preserve free future access to Project
Gutenberg™ electronic works. See paragraph 1.E below.

1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright law
in the United States and you are located in the United States, we do
not claim a right to prevent you from copying, distributing,
performing, displaying or creating derivative works based on the
work as long as all references to Project Gutenberg are removed. Of
course, we hope that you will support the Project Gutenberg™
mission of promoting free access to electronic works by freely
sharing Project Gutenberg™ works in compliance with the terms of
this agreement for keeping the Project Gutenberg™ name associated
with the work. You can easily comply with the terms of this
agreement by keeping this work in the same format with its attached
full Project Gutenberg™ License when you share it without charge
with others.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the
terms of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.
1.E. Unless you have removed all references to Project Gutenberg:
1.E.1. The following sentence, with active links to, or other
immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project Gutenberg™
work (any work on which the phrase “Project Gutenberg” appears,
or with which the phrase “Project Gutenberg” is associated) is
accessed, displayed, performed, viewed, copied or distributed:

This eBook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this eBook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.
1.E.2. If an individual Project Gutenberg™ electronic work is derived
from texts not protected by U.S. copyright law (does not contain a
notice indicating that it is posted with permission of the copyright
holder), the work can be copied and distributed to anyone in the
United States without paying any fees or charges. If you are
redistributing or providing access to a work with the phrase “Project
Gutenberg” associated with or appearing on the work, you must
comply either with the requirements of paragraphs 1.E.1 through
1.E.7 or obtain permission for the use of the work and the Project
Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9.
1.E.3. If an individual Project Gutenberg™ electronic work is posted
with the permission of the copyright holder, your use and distribution
must comply with both paragraphs 1.E.1 through 1.E.7 and any
additional terms imposed by the copyright holder. Additional terms
will be linked to the Project Gutenberg™ License for all works posted
with the permission of the copyright holder found at the beginning
of this work.
1.E.4. Do not unlink or detach or remove the full Project
Gutenberg™ License terms from this work, or any files containing a
part of this work or any other work associated with Project
Gutenberg™.
1.E.5. Do not copy, display, perform, distribute or redistribute this
electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1

with active links or immediate access to the full terms of the Project
Gutenberg™ License.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or
expense to the user, provide a copy, a means of exporting a copy, or
a means of obtaining a copy upon request, of the work in its original
“Plain Vanilla ASCII” or other form. Any alternate format must
include the full Project Gutenberg™ License as specified in
paragraph 1.E.1.
1.E.7. Do not charge a fee for access to, viewing, displaying,
performing, copying or distributing any Project Gutenberg™ works
unless you comply with paragraph 1.E.8 or 1.E.9.
1.E.8. You may charge a reasonable fee for copies of or providing
access to or distributing Project Gutenberg™ electronic works
provided that:
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information

about donations to the Project Gutenberg Literary Archive
Foundation.”
• You provide a full refund of any money paid by a user who
notifies you in writing (or by e-mail) within 30 days of receipt
that s/he does not agree to the terms of the full Project
Gutenberg™ License. You must require such a user to return or
destroy all copies of the works possessed in a physical medium
and discontinue all use of and all access to other copies of
Project Gutenberg™ works.
• You provide, in accordance with paragraph 1.F.3, a full refund of
any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™
electronic work or group of works on different terms than are set
forth in this agreement, you must obtain permission in writing from
the Project Gutenberg Literary Archive Foundation, the manager of
the Project Gutenberg™ trademark. Contact the Foundation as set
forth in Section 3 below.
1.F.
1.F.1. Project Gutenberg volunteers and employees expend
considerable effort to identify, do copyright research on, transcribe
and proofread works not protected by U.S. copyright law in creating
the Project Gutenberg™ collection. Despite these efforts, Project
Gutenberg™ electronic works, and the medium on which they may
be stored, may contain “Defects,” such as, but not limited to,
incomplete, inaccurate or corrupt data, transcription errors, a
copyright or other intellectual property infringement, a defective or

damaged disk or other medium, a computer virus, or computer
codes that damage or cannot be read by your equipment.
1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except for
the “Right of Replacement or Refund” described in paragraph 1.F.3,
the Project Gutenberg Literary Archive Foundation, the owner of the
Project Gutenberg™ trademark, and any other party distributing a
Project Gutenberg™ electronic work under this agreement, disclaim
all liability to you for damages, costs and expenses, including legal
fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR
NEGLIGENCE, STRICT LIABILITY, BREACH OF WARRANTY OR
BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH
1.F.3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK
OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL
NOT BE LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT,
CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES EVEN IF
YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.
1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you
discover a defect in this electronic work within 90 days of receiving
it, you can receive a refund of the money (if any) you paid for it by
sending a written explanation to the person you received the work
from. If you received the work on a physical medium, you must
return the medium with your written explanation. The person or
entity that provided you with the defective work may elect to provide
a replacement copy in lieu of a refund. If you received the work
electronically, the person or entity providing it to you may choose to
give you a second opportunity to receive the work electronically in
lieu of a refund. If the second copy is also defective, you may
demand a refund in writing without further opportunities to fix the
problem.
1.F.4. Except for the limited right of replacement or refund set forth
in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,

INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
1.F.5. Some states do not allow disclaimers of certain implied
warranties or the exclusion or limitation of certain types of damages.
If any disclaimer or limitation set forth in this agreement violates the
law of the state applicable to this agreement, the agreement shall be
interpreted to make the maximum disclaimer or limitation permitted
by the applicable state law. The invalidity or unenforceability of any
provision of this agreement shall not void the remaining provisions.
1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation,
the trademark owner, any agent or employee of the Foundation,
anyone providing copies of Project Gutenberg™ electronic works in
accordance with this agreement, and any volunteers associated with
the production, promotion and distribution of Project Gutenberg™
electronic works, harmless from all liability, costs and expenses,
including legal fees, that arise directly or indirectly from any of the
following which you do or cause to occur: (a) distribution of this or
any Project Gutenberg™ work, (b) alteration, modification, or
additions or deletions to any Project Gutenberg™ work, and (c) any
Defect you cause.
Section 2. Information about the Mission
of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new computers.
It exists because of the efforts of hundreds of volunteers and
donations from people in all walks of life.
Volunteers and financial support to provide volunteers with the
assistance they need are critical to reaching Project Gutenberg™’s
goals and ensuring that the Project Gutenberg™ collection will

remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.
Section 3. Information about the Project
Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.
The Foundation’s business office is located at 809 North 1500 West,
Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact
Section 4. Information about Donations to
the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many

small donations ($1 to $5,000) are particularly important to
maintaining tax exempt status with the IRS.
The Foundation is committed to complying with the laws regulating
charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.
While we cannot and do not solicit contributions from states where
we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.
International donations are gratefully accepted, but we cannot make
any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.
Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.
Section 5. General Information About
Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.

Project Gutenberg™ eBooks are often created from several printed
editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
This website includes information about Project Gutenberg™,
including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.