This column is the continuation of our previous column (1) that describes and explains some algorithms and data transforms beyond those most commonly used. We present and discuss algorithms that are rarely, if ever, seen or used in practice, despite that they have been proposed and described in the literature. These comprise algorithms used in conjunction with continuous spectra, as well as those used with discrete spectra. In our previous column, we examined calibration methods based on the use of classical least squares, Karl Norris’ derivative-ratio technique, and David Haaland’s post-calibration augmented-components technology.
This algorithm was developed by the statistics community as a means of overcoming the effects of a badly-conditioned [X’X] matrix, such as we sometimes encounter in near-infrared (NIR) analysis. What does that mean? We have, in fact, discussed that situation relatively recently (1,2). It refers to the behavior of the normal equations when there is correlation between two or more different data variables. In the higher-dimensional space in which the normal equations exist, the data are not spread out evenly in the way we would like (all over the multidimensional space), but collapse into a line or plane, the correlation among the data variables being transformed into planes or straight lines in the higher-dimensional spaces (2). A more extensive (and rigorous) discussion of the ridge method can be found in reference (3).
The ridge transformation is intended as a means of minimizing the effect of the aforementioned correlations on the calibration results, thereby stabilizing the model and reducing the variability of the calibrations produced.
As far as we’re aware, ridge regression has been applied only to multiple linear regression (MLR) calibrations, but not to principal component regression (PCR) or partial least squares (PLS). Applying the “ridge” is a means of modifying the [X’X] matrix. We’ve presented the [X’X] matrix numerous times in the course of our “Chemometrics in Spectroscopy” columns, usually in the context of describing MLR or its variants, and as part of a larger matrix equation.
The [X’X] (since X in this example represents the absorbance A), this matrix is (in our usage for three wavelengths):
where A (the absorbance values) comprise the X. The ridge operation replaces each of the diagonal elements of equation 1 with a constant (k) added to the original value. In matrix notation, the ridge operation is:
where [1] represents the unit matrix of a size commensurate with [A’A], and k is the constant to be added to the diagonal elements of the matrix. The matrix on the right-hand side (RHS) of equation 1 is thus converted to:
By increasing the magnitudes of only the diagonal elements of the matrix, the off-diagonal elements have less influence on the final results, thereby reducing the variability of the final calibration coefficients that we have previously discussed (2)
The regression calculations are then continued the same way as usual, but using the revised matrix of equation 3 in place of the original matrix. Chapter 17 in (3) contains an extensive discussion of ridge regression, including some example results of applying it to a set of regression data (although not spectroscopic data).
Of interest is the answer to the question “What effect do changes in the value of k have on the calibration?” We can address that question by performing a small computer exercise in which we take a data set and perform an MLR calibration (essentially setting k in equation 3 to zero), then inspect the way the results change as k is varied. Table I presents the variation of calibration coefficients and other statistics vary for the given data set and various values of k (k = 0 is the default value; when k = 0 MLR is used). Unsurprisingly, larger values of k cause larger departures of the calibration coefficients from their MLR values.
The results from this mini-experiment are presented in Table I in two parts, Table Ia and Table Ib, which present the results of that experiment using ground wheat samples with known amounts of protein. Since sample preparation and some calibration results obtained using this set of data have been previously published (4), there is no need to describe the samples or sample preparation again.
Tables Ia and Ib present the results from using various values of k in equations 2 and 3 above. Setting k= 0 causes the equation to devolve into ordinary MLR, and those are the results seen in Table Ia. Table Ia also presents the results generated for values of k > 0.
We expect the calibration properties to depart from the MLR (k = 0) values and as expected, the results obtained by using values of k > 0 de-emphasizes the contributions of the off-main diagonal terms of the matrix. This is reflected in the decrease in magnitude of the calibration coefficients, as well as in the t-values for those coefficients. Similarly, the calibration performance statistics (SEE, etc.) also degrade monotonically over the same values for k. Table Ia gives the results for decadic changes in k (coarse) values to determine the limiting values of k that can produce useful results.
Note that, in Table Ia, the calibration statistics vary relatively smoothly and uniformly from k=0 to k=0.001; beyond that value the results become erratic as k keeps increasing. Also, the t-values for the calibration coefficients, as well as the F statistic for the calibration, all decrease to become non-statistically significant at high values of k, indicating that the calibration is failing to have any predictive value. These characteristics indicate the limits of usefulness of the ridge approach. Table Ib presents similar results for a smaller range of values for k, with smaller intervals between values where the effect of k is evaluated, essentially an expansion of the corresponding part of Table Ia.
Another “mystery” in the early days of NIR analysis was why calibrations couldn’t be reproduced, or why the same calibration coefficients couldn’t be obtained for similar data sets. One proposed approach to achieving transferable calibrations in those days was to use isonumeric calibration coefficients. The idea of using calibration performance (such as standard error of estimate [SEE] or standard error of prediction [SEP]) as a criterion for selecting wavelengths was replaced by the concept of selecting wavelengths that gave the same calibration coefficients for closely spaced wavelengths. Experience showed that, after all, performance didn’t change much over moderate changes in wavelength selection.
“What are isonumeric calibration coefficients?” you ask. The answer is that’s a fancy way to say that they are coefficients that don’t change when a calibration is performed using slightly different wavelengths than another calibration model.
We have discussed above that, in the early days of NIR analysis, one of the difficulties encountered was the lack of ability to reproduce a calibration model even when all the conditions of the measurement were the same as before. We also noted that there was a large random contribution to that variability. A factor that was not discussed, however, was a systematic source of variability: the change in the coefficients of a model when different wavelengths were used.
Even in the simplest possible case, applying Beer’s Law to a single-wavelength model, different calibration coefficients would be calculated for spectral data at different wavelengths; this can be seen for synthetic data in Figure 3 on page 7 of (5). In the simple case shown, it is clear, and not surprising, that the magnitude of the coefficient varies with wavelength, and the variation is smallest at the peak of the underlying spectral absorbance band. Furthermore, the coefficient doesn’t change, for small wavelength changes around that spectral peak, such as db/dλ = 0 (where b represents the coefficient and λ the wavelength).
For the case of multi-wavelength models, the relation of the coefficients to the underlying spectral characteristics is not so clear cut, but the same principles hold. Even for the case of a two-wavelength model, as shown in Figure 5 on page 14 of (5). The wavelength where db/dλ = 0 no longer coincides with the spectral peaks of the underlying constituent spectra, but there is still a place where that condition occurs. Nevertheless, the discussions on page 15 and page 101 of (5) demonstrate that wavelengths exist where db/dλ = 0 (that is, places occur within the spectrum where isonumeric wavelengths can be found; a range exists where the same value regression coefficient is calculated). A demonstration of the concept using actual hard red spring (HRS) wheat calibration data has also been published (6), while also coining the term.
An extraneous variable is one that can be introduced into a calibration model, but is not any of the measured absorbance values in the spectrum. Unfortunately, we know of no software programs marketed to the near-infrared community that includes that capability, although there are general-purpose statistical analysis software packages that are agnostic as to the source of the data, and could be set up to include an extraneous variable in their model as easily as including spectroscopic data.
An example where an extraneous variable might be desirable is if the user suspects that an external phenomenon, such as temperature, is affecting the results, but cannot pin down the effect to any particular wavelength or wavelengths. Indeed, the extraneous effect may not be affecting the spectroscopic data at all but still cause errors due to the effect on other properties of the sample, such as the density. Conventional use of NIR would ignore the underlying behavior and just “hope for the best” by including the spectroscopic data at all spectroscopic wavelengths. Indeed this is the current state of the technology. However “science by hope” is an example of poor science, as we said above.
From a more practical point of view, including an extraneous variable which is an important predictor variable of the analyte may enable the development of calibration models requiring fewer wavelengths (for MLR models) or factors (for PCR/PLS models). In this case, the models developed may be more stable (less drift with time) and more resistant to the effects of other changes of the environment, especially if similar environmental effects are present in the data used to create the calibration model. The extraneous variable can be included in the data as though it represented another wavelength, although it then becomes the responsibility of the user to measure the temperature (or whatever physical variable the extraneous variable represents) and include the value for the prediction calculations.
Indicator variables comprise a special kind of extraneous variable. They can be used in situations where a non-numeric type of data is available and we want to incorporate that data in the calibration model. More extensive explanations are available(see reference [5], pp. 78–80; 127–129) and earlier editions of the Handbook (7), pp. 297–306, but, for now, we will just take an overview of the subject.
Figure 1 depicts a hypothetical situation where we have a set of calibration samples in which some of the samples were ground to a different average particle size than others (perhaps different grinders were unknowingly used). One result we might expect to see is that there would be a bias between the analyses of the samples from the two grinders. A plot of the results might look like Figure 1. The bias between the subsets of samples creates a large error contribution to the error budget, both for the individual samples and for the statistical summary of the calibration performance. This is a typical situation in which the addition of an indicator variable might be helpful.
Tables IIa and IIb present the layout as to how an indicator variable fits into a calibration scheme. While not actual absorbance data, it must be treated as though it was an independent variable, such as absorbance value from an actual wavelength, and can be handled similarly.
As shown, the indicator variable only takes on values of zero (0) or one (1). The samples representing the “main” or “reference” set of samples are assigned the value 0 for all indicator variables (when more than one are used) corresponding to those sample’s data. The rest of the data readings are assigned the value 1 for their indicator variable, corresponding to those samples. When an MLR calibration is performed, the coefficient for the indicator variable represents the bias between the reference set of samples and the “other” samples. It is unknown at this time how this concept can be implemented for PCA or PLS algorithms.
“Conventional wisdom” tells us that a spectrum measured on a Fourier transform (FT) spectrometer cannot be precisely converted to the spectrum that would be measured on a dispersive (such as a diffraction grating based) spectrometer. Well, guess what? I turns out that “conventional wisdom” is wrong.
There are two main reasons given for making the claim that spectra in the two different domains cannot be made to be compatible:
Fourier transform (FT-IR) spectra are inherently measured and presented to be linear in wavenumber units while dispersive spectra (such as using diffraction gratings to disperse the EM energy) are inherently measured and linear in wavelength units. The two types of units are fundamentally mismatched, being inversely related and non-linearly related to each other. Nevertheless, converting one scale to the other is relatively straightforward:
where λ and ν represent the wavelength of the NIR radiation in nanometers and wavenumbers, respectively. However, due to the ν in the denominator of equation 4, the converted scale is non-linearly related to the original scale. Thus if an FT-IR spectrometer is used to measure a spectrum and then the wavenumber scale for that spectrum is converted point-for-point to wavelengths, plotting that spectrum will appear compressed at one end and stretched out at the other end. Also, the wavelength intervals of the wavelength scale will not be uniform across the spectrum. Because of these effects the spectrum obtained from simplistically converting the wavenumber scale of an FT-IR spectrum to wavelengths will not match a spectrum actually measured on a dispersive instrument.
For similar reasons, the spectral resolution capabilities of the two technologies are similarly related. Technologies that use diffraction gratings to create dispersive spectra result in spectra with uniform wavelength resolution across the spectrum, while FT-IR technologies result in spectra with uniform wavenumber resolution across the spectrum. Converting spectral resolution from one set of units to the other is somewhat more complicated, as represented by equation 5:
where again, Δν and Δλ represent the spectral resolutions of the NIR spectra in the two complementary technologies: wavenumbers and wavelength, respectively.
A problem arises from the presence of the λ2 term in the denominator of equation 5; even with constant wavenumber resolution in the Fourier spectrum, simply converting the scale to constant wavelength would result in erroneous values for the spectrum, due to the fact that each data point, when converted to wavelengths, would correspond to the spectral value for different, and incorrect, spectral resolution in wavelength units, in addition to the non-linearity effects described above. A process called apodization (described in books about FT-IR spectroscopy [8,9]) could be used to adjust the spectral wavelength resolution to any desired amount. That change would then apply to the entire spectrum being computed but only one wavelength, corresponding to the particular apodization function used, would be correct. Only that one wavelength would have the correct value for that piece of the spectrum.
However, while a FT is a complicated mathematical operation (at least as complicated as the math of the calibration algorithms we routinely use), it is completely deterministic and the result of applying the concept to our spectroscopic data is uniquely determined by the spectral data and the apodization function used. Many books and book chapters have been written about FT and its application to spectroscopy. Bell (8) presents a rigorous mathematical analysis of relation of the FT to spectroscopy, including a rigorous derivation to prove that FT of an interferogram (the signal provided by the interferometer of an FT spectrometer) is indeed the spectrum, coded in the interferogram. A somewhat less rigorous, but clearer and more readable presentation, was written by Ramirez (10).
The bottom line, however, is that because the FT is deterministic, any given mathematical function, which in the context of Fourier mathematics includes interferograms and their FTs (such as their corresponding spectra) can be interpolated. Thus, there is no requirement for the result of Fourier transform to be computed only at wavelengths corresponding to ν-1. By carefully specifying the apodization function to apply, it is perfectly legitimate and acceptable to compute the correct value of a spectrum that corresponds to any value of λ, even if that value falls between two values of 10,000/ν, just as a straight line or a parabola could be used to interpolate between data points in appropriate circumstances. However, if this is done, care must be taken to recall that, even though a full FT is computed, only the single data point corresponding to the apodization function applied to the interferogram has the correct value.
The bandwidth (spectral resolution) of the spectrum at any wavelength can be similarly controlled by suitable choice of the apodization function. Thus, we see that the justification for believing that FT-IR spectra cannot be converted to an equivalent spectrum that could be produced by a diffraction grating based or other dispersion instrument is false, and there is no justification for that belief.
By far the most prominent modern proponent of FT-IR and its applications is Peter Griffiths, who with his coauthors has written several books (9,11,12), a more applications-oriented book (13), and innumerable scientific articles on the subject. Of particular interest here is an article that Griffiths, along with Husheng Yang, his then-student, and several other colleagues wrote where they described how they could use the deterministic properties of the FT to overcome the difficulties encountered when converting an FT spectrum to one that would be measured on the same sample using a dispersive spectrometer (14).Basically, their algorithm consists of first applying the apodization function to the interferogram that would result in the first wavelength of the computed spectrum having the correct value for that first wavelength. The rest of the computed spectral values would be incorrect, and therefore would be discarded. Only the value of that first spectral data point would be retained. Then, the whole process is repeated, using a new apodization function, one that would provide the correct value of resolution for the second spectral data point. Only the second data point of the Fourier transform of that apodized data would be retained in the spectrum that was being built up. This process would then continue until all the wavelengths in the output spectrum were computed.
It is a highly inefficient and computer-intensive process. If there are n measured data points in the original interferogram, there will be n spectral data points to compute. By applying a fast Fourier transform (FFT) to the data, the amount of computation required for each one is proportional to n*log(n), which can be accomplished in a reasonable amount of time for computing a single spectrum, even if the interferogram contained as many as 216 (= 65,536) measured data points.
Since we want to convert an entire spectrum to a constant-wavelength resolution equivalent of a spectrum from a dispersive instrument, however, that process has to be repeated 2n times. In this case n = 16, resulting in a total computation time proportional to n2 * log(n). Discussion with one of the authors (14) revealed that in those days, when the cited work was performed, computers were less advanced than the ones we enjoy today, and performing the conversion of a FT spectrum to the equivalent of a modern dispersive spectrum with constant spectral resolution required approximately a week of computation time! Husheng also estimated that doing it on a modern computer, say a PC computer with Windows and MATLAB, could be accomplished about one order of magnitude faster (15), and maybe even less if a computer and software capable of implementing parallel processing is used.
(1) Mark, H.; Workman, J. Data Transforms in Chemometric Calibrations, Part 4A: Continuous-Wavelength Spectra and Discrete-Wavelength Models. Spectroscopy 2024, 39 (2), 12–17. DOI: 10.56530/spectroscopy.yr5374m6
(2) Mark, H.; Workman, J. Data Transforms in Chemometric Calibrations: Application to Discrete-Wavelength Models, Part 1: The Effect of Intercorrelation of the Spectral Data. Spectroscopy 2022, 37 (2), 16–18, 54. DOI: 10.56530/spectroscopy.wp6284b9
(3) Draper, N.; Smith, H. Applied Regression Analysis - Third Edition. John Wiley & Sons, 1998.
(4) Mark, H. Comparative Study of Calibration Methods for Near-Infrared Reflectance Analysis Using a Nested Experimental Design. Anal. Chem. 1986, 58, 2814–2819. DOI: 10.1021/ac00126a051
(5) Mark, H. Principles and Practice of Spectroscopic Calibration. John Wiley & Sons, 1991.
(6) Mark, H.; Workman, J. A New Approach to Generating Transferable Calibrations for Quantitative Near-Infrared Spectroscopy. Spectroscopy 1988, 3 (11), 28–36.
(7) Burns, D. A.; Ciurczak, E. W. Handbook of Near-Infrared Analysis, 3rd edition. CRC Press, 2008. pp. 297–306; 808.
(8) Bell, R. J. Introductory Fourier Transform Spectoscopy. Academic Press, 1972.
(9) Griffiths, P.; deHaseth, J. A. Fourier Transform Infrared Spectroscopy, John Wiley & Sons, 1986.
(10) Ramirez, R. W. The FFT Fundamentals and Concepts. Prentice-Hall, 1985.
(11) Griffiths, P. R. Transform Techniques in Chemistry. Plenum Press, 1978.
(12) Griffiths, P.; Chalmers, J. Handbook of Vibrational Spectroscopy, Volume 3. Wiley, 2002.
(13) Griffiths, P. Chenmical Infrared Fourier Transform Spectroscopy, 1st edition. John Wiley & Sons, 1975.
(14) Yang, H.; Isaksson, T.; Jackson, R. S.; Griffiths, P. Effect of Resolution on the Wavenumber Determination of a Putative Standard to be Used for Near Infrared Diffuse Reflection Spectra Measured on Fourier Transform Near Infrared Spectrometers. J. Near Infrared Spectrosc. 2003, 11 (4), 229–240. DOI: 10.1255/jnirs.371
(15) Yang, H.; Personal Communication (2023).
Young Scientist Awardee Uses Spectrophotometry and AI for Pesticide Detection Tool
November 11th 2024Sirish Subash is the winner of the Young Scientist Award, presented by 3M and Discovery education. His work incorporates spectrophotometry, a nondestructive method that measures the light of various wavelengths that is reflected off fruits and vegetables.
Emerging Leader Highlights Innovations in Machine Learning, Chemometrics at SciX Awards Session
October 23rd 2024Five invited speakers joined Joseph Smith, the 2024 Emerging Leader in Molecular Spectroscopy, on stage to speak about trends in hyperspectral imaging, FT-IR, surface enhanced Raman spectroscopy (SERS), and more during the conference in Raleigh.
Advancing Forensic Science with Chemometrics: New Tools for Objective Evidence Analysis
October 22nd 2024A review by researchers from Curtin University comprehensively explores how chemometrics can revolutionize forensic science by offering objective and statistically validated methods to interpret evidence. The chemometrics approach seeks to enhance the accuracy and reliability of forensic analyses, mitigating human bias and improving courtroom confidence in forensic conclusions.