Chemometrics in Spectroscopy is a collection of column articles that the authors published in Spectroscopy over a period spanning more than two decades. Each article is generally arranged as a chapter in the book, and chapters dealing with the same or similar topics are arranged closely as a section block rather than following the original sequence in the magazine. Although each article or series of articles only discusses one specific topic, collectively, the articles form a comprehensive reference that is a valuable source for readers wanting to learn chemometrics, especially with its applications in spectroscopy.
Chemometrics in Spectroscopy is a collection of column articles that the authors published in Spectroscopy over a period spanning more than two decades. Each article is generally arranged as a chapter in the book, and chapters dealing with the same or similar topics are arranged closely as a section block rather than following the original sequence in the magazine. Although each article or series of articles only discusses one specific topic, collectively, the articles form a comprehensive reference that is a valuable source for readers wanting to learn chemometrics, especially with its applications in spectroscopy.
The book is divided into 128 chapters, split over 24 sections, and a vast 1040 pages. Some might ask whether it is really necessary to wade through such a weighty tome to learn chemometrics. The answer could be “no,” with the argument that for the four multivariate methods delineated in the book, multiple linear regression (MLR), principal component regression (PCR), partial least squares (PLS) regression, and classical least squares (CLS), each could be described in no more than perhaps two pages in the matrix form. Furthermore, user-friendly software packages are readily available and generally easy to use, and in many cases it might only take a few mouse clicks to build a calibration model. At the end, if the predictions from the calibration model show marked differences from the actual data, one can apply the famous “garbage in, garbage out” principle to declare the data garbage, which is not uncommon to see nowadays in real practice.
However, if one believes that good data should not be discarded as garbage without good reason, especially when months, or even years, of effort may have been invested to collect them, it is then worthwhile to read this book, particularly the sections about CLS. Readers will find that even very experienced chemometricians could not develop the right model the first time, simply because the wrong unit of measure was chosen for concentration. CLS is the simplest method among all the methods presented in the book, and could easily be described by two or three equations. However, the authors spend 14 chapters discussing the method and its applications in considerable length. The authors could have simply told the readers that volume percent or a comparable unit should be used as the concentration unit rather than weight percent. Instead, the authors spend many pages describing their troubleshooting process along with the identification and verification of the root cause. Readers will perhaps benefit more by reading the lengthy description of the problem-solving process, rather than learning the simple reason in one sentence.
One cannot avoid mathematics when learning chemometrics. Mathematics is needed to understand the equations in the multivariate methods and also many statistical methods for evaluating raw data and model predictions. A beginner does not need much mathematics background to learn chemometrics using this book, however. Matrix algebra is introduced in Section 1, and then further expanded in Section 2, together with the description of MLR. Analytical geometry, which is needed for the understanding of regressions, is introduced in Section 4. The rather detailed description of the mathematical methods using simple language may seem somewhat tedious to an experienced reader, but is certainly beneficial to someone who is less comfortable with equations. PCR and PLS are introduced similarly by giving the detailed calculations using the simplest possible data set. The concept of principal components is perhaps the most difficult to understand for people who are new to multivariate analysis, or do not have a strong background in matrix algebra. To help readers understand principal components, the authors present an approach that uses only elementary algebra to derive the algorithm. This approach is long, complicated, and rather tedious, but could enable readers who are new to this topic to understand every single step intuitively.
Statistics is a very big area for chemometrics, with many methods and books available. The authors have very strong backgrounds in statistics, and have certainly included necessary and adequate statistical concepts or methods for chemometricians. Design of experiments (DoE) or experimental design, including analysis of variance (ANOVA) is introduced in Section 3 using easy-to-understand language. DoE is a very relevant topic, and the introduction is rather high level for such a significant area. Section 7, titled “Collaborative Laboratory Studies,” contains a detailed description of statistical methods for comparing different analytical results and methods, although these methods can also be found in regular analytical chemistry books. Section 10, “Goodness of Fit Statistics” describes statistical tools needed for analyzing linear regression. Section 20, “Statistics,” contains articles introducing the three foundations of the subject, which are useful for people without much statistical background. Useful discussions of statistics can also be found in Section 12 (“Connecting Chemometrics to Statistics”) and Section 15 (“Clinical Data Reporting”).
Multivariate methods such as PLS can be, and have been, used on data from various scientific disciplines when the data can be arranged in formats that are suitable as inputs and outputs of the methods. The multivariate methods perform the same calculations regardless of the source of the data, whether the data are from chemistry or economics. However, the methods for preprocessing the raw data to render them suitable for the multivariate methods to yield the best prediction model can be very different, depending on the data source. The data and examples used in this book are mainly related to near-infrared (NIR) or mid-infrared (mid-IR) spectra. There are no discussions of infrared spectrometers or spectroscopic measurements in the book, perhaps because the authors assumed that the readers of Spectroscopy already had the necessary knowledge or experience. Readers who are unfamiliar with IR spectroscopy would need to obtain some basic knowledge through other sources to get the most benefit from this book.
The mathematics described in this book are very basic, such as Section 1, “Elementary Matrix Algebra.” Conversely, the chemometrics-related discussions on spectroscopic data are rather advanced, a reflection of the knowledge and experience that the authors possess through many years of practicing and writing in this field. The book contains deep discussions of derivatives, linearity, noise, and outliers, all of which are critical for building good chemometric models.
One very important area, perhaps as important as building the original calibration model, is calibration transfer. IR spectroscopic methods should be able to replace some traditional wet chemistry methods, such as chromatographic methods that consume solvents and generate significant volumes of waste, something that is truly needed in the green chemistry era. Many times it was the maintenance of a calibration model, rather than the initial cost of building the model, that prevented the acceptance of the spectroscopic methods within commercial manufacturing environments. Transferring calibration models is not impossible from a technical standpoint but the required knowledge of chemometrics is generally not sufficiently widespread. Calibration transfer is well covered in three sections or ten chapters in the book. All aspects of calibration transfer, such as reference standards, instrument performance, and modeling algorithms are described in the book. A review of published methods on calibration transfer is also included.
Since the book is written in column format from the authors’ own columns in Spectroscopy, it contains lots of authors’ comments. These comments were not always agreed upon by other experts in this field. As an example, the initial article on linearity in Section 6 included discussions with several other experts, when a simple noise-free synthetic nonlinear set of spectra was used to compare MLR, PCR, and PLS. The discussions reveal that, even with such a very simple set of spectra of a single component, expert chemometricians could use different approaches and interpret the results differently. Building a multivariate calibration model comprises a fixed sequence of steps, but there are various approaches that can be taken at each step. The combinations of these approaches across all steps can be numerous, resulting in many different models. Reading these discussions may inspire more efforts on building better calibration models rather than concluding that the raw data are garbage!
Some improvements can perhaps still be made in future versions or in the book’s companion website, a publisher-hosted resource where the authors can add additional material in electronic format. Color figures are already available on the companion website if they were printed as black-and-white in the hard copy. Since the book was not written as a textbook by first intent, it does not provide example problems for the readers to practice with. The book does provide data or scripts that are in MATLAB or Mathcad format, however. Maybe in future editions or on the companion website, the authors can add Python or R versions of the data and scripts to benefit readers who do not have access to the named software packages.
For such a large book that encompasses so many diverse topics, it would help readers navigate the information presented if the titles of some chapters, although not many, could be updated to more closely reflect their contents and made available on the book’s companion website. For example, the book contains 15 chapters in Section 8, “Analysis of Noise.” Many types of noise and their effects on spectra are discussed, together with sophisticated equation derivations. However, the chapters themselves are simply named Part 1, Part 2, and so on, up to Part 15. It would be very helpful to readers if the titles indicated which type of noise is discussed in each chapter.
Some minor topics or discussions are likely to be of particular interest to specific groups of readers, but, unfortunately, they cannot be found either from the table of contents or from the index. Aside from Section 15, “Clinical Data Reporting,” there are three additional chapters that are particularly useful to users in the pharmaceutical industry. However, readers would not know of their existence unless they happen upon them. Some readers might be comfortable to read the book sequentially using the current arrangement of the chapters, but other readers will need to read the chapters in a different order, or will only need to read some selected chapters. It might be useful if the authors can provide some study guides, tailored to different reader groups, and add them to the book’s companion website.
Neural network (NN) is one area that is mentioned, but not discussed, in the book. NN is a useful method when dealing with nonlinear data. Similarly, multivariate curve resolution (MCR) has found increased and successful use in recent years, but is not included in the book. MCR has the potential to remove the reliance on a primary analytical method for certain types of quantitative analysis. Both techniques are worth adding to future editions of the book. Classification of spectra may also need to be expanded in future editions.
This book certainly contains sufficient material for beginners to learn chemometrics and for advanced users to strengthen their skills for dealing with more challenging calibration problems or even to create innovative applications in vibrational spectroscopy. Written originally in journal column format, the book contains lots of author’s comments, and very detailed explanations (in the author’s words, “to treat any topic at whatever length is necessary”). Whether readers like the book or not may depend on whether the reader enjoys the writing style of the column format, and if the reader can quickly locate topics of interest. Readers might want to obtain some sample chapters, which should be possible through the publisher or past issues of Spectroscopy, to find out if they are comfortable with the writing style. An enhanced table of contents, or some type of study guides, would help readers find their desired contents faster.
Husheng Yang is a senior scientific investigator and chemometrician at GlaxoSmithKline, in Collegeville, Pennsylvania. The reviewer is not acting as a representative or agent of GSK. Direct correspondence to: husheng.x.yang@gsk.com
AI, Deep Learning, and Machine Learning in the Dynamic World of Spectroscopy
December 2nd 2024Over the past two years Spectroscopy Magazine has increased our coverage of artificial intelligence (AI), deep learning (DL), and machine learning (ML) and the mathematical approaches relevant to the AI topic. In this article we summarize AI coverage and provide the reference links for a series of selected articles specifically examining these subjects. The resources highlighted in this overview article include those from the Analytically Speaking podcasts, the Chemometrics in Spectroscopy column, and various feature articles and news stories published in Spectroscopy. Here, we provide active links to each of the full articles or podcasts resident on the Spectroscopy website.
Diffuse Reflectance Spectroscopy to Advance Tree-Level NSC Analysis
November 28th 2024Researchers have developed a novel method combining near-infrared (NIR) and mid-infrared (MIR) diffuse reflectance spectroscopy with advanced data fusion techniques to improve the accuracy of non-structural carbohydrate estimation in diverse tree tissues, advancing carbon cycle research.
Young Scientist Awardee Uses Spectrophotometry and AI for Pesticide Detection Tool
November 11th 2024Sirish Subash is the winner of the Young Scientist Award, presented by 3M and Discovery education. His work incorporates spectrophotometry, a nondestructive method that measures the light of various wavelengths that is reflected off fruits and vegetables.
Emerging Leader Highlights Innovations in Machine Learning, Chemometrics at SciX Awards Session
October 23rd 2024Five invited speakers joined Joseph Smith, the 2024 Emerging Leader in Molecular Spectroscopy, on stage to speak about trends in hyperspectral imaging, FT-IR, surface enhanced Raman spectroscopy (SERS), and more during the conference in Raleigh.