Scientists from East China Jiaotong University, located in Nangchang, Jiangxi, China, recently tested different sample selection methods using near-infrared (NIR) spectral information entropy as a similarity criterion. Their findings were published in the Journal of Chemometrics (1).
Near-infrared (NIR) spectroscopy has been used in a wide variety of tasks in the past few years. The technique has been used to predict the harvest times of cabernet sauvignon grapes, detect Covid-19, and analyze emission lines from a supernova (in tandem with mid-infrared [MIR] spectroscopy (2–4). When using NIR, model constructions and maintenance updates are essential. Model construction, when being performed in machine learning, usually has a sample set divided into a calibration set and a validation set. The representativeness of the calibration set, and the reasonable distribution of the validation set affect the accuracy of the established model. Additionally, while maintaining and updating models, selecting the most informative updated samples can not only improve the model prediction accuracy, but also reduce the amount of sample preparation that is necessary.
For this study, spectral information entropy (SIE) is proposed as a similarity criterion for dividing sample sets, with this criterion being used to select updated samples. Two methods were used for comparing and verifying the superiority of this proposed method: the Kennard–Stone (KS) method, which is a way to perform a split between training and test set based on a distance metric between data points, spectra or labels, and the sample set portioning based on joint x–y distance (SPXY) method (5).
The model that was built after dividing the sample set with SIE was shown to have a good prediction effect compared to the sample sets that were divided with KS and SPXY. When predicting soluble solid content (SSC) and hardness, the prediction determination coefficient (R2P) was improved by over 15%, while the root mean square error (RMSE) of prediction was reduced by 50%. Regarding model updating, it was found that selecting a small number of updated samples using SIE can improve a correlation efficient (RP) by more than 80%, with updated models having prediction accuracies higher than those of the KS and SPXY methods. These results confirm that SIE can make the NIR analysis technique more reliable.
(1) Liu, Y.; He, C.; Jiang, X. Sample Selection Method Using Near-Infrared Spectral Information Entropy as Similarity Criterion for Constructing and Updating Peach Firmness and Soluble Solids Content Prediction Models. J. Chemom. 2023, 38 (2), e3528. DOI: https://doi.org/10.1002/cem.3528
(2) Luo, Y.; Zhao, J.; Zhu, H.; Li, X.; Dong, J.; Sun, J. Prediction of the Harvest Time of Caberney Sauvignon Grapes Using Near-Infrared Spectroscopy. Spectroscopy 2024. https://www.spectroscopyonline.com/view/prediction-of-the-harvest-time-of-cabernet-sauvignon-grapes-using-near-infrared-spectroscopy (accessed 2024-3-25)
(3) Acevedo, A. Detecting Covid-19 Using Visible or Near-Infrared Spectroscopy and Machine Learning. Spectroscopy 2023. https://www.spectroscopyonline.com/view/detecting-covid-19-using-visible-or-near-infrared-spectroscopy-and-machine-learning (accessed 2024-3-25)
(4) Wetzel, W. Observing Supernova 1987A with Near-infrared and Mid-infrared Spectroscopy. Spectroscopy 2024. https://www.spectroscopyonline.com/view/observing-supernova-1987a-with-near-infrared-and-mid-infrared-spectroscopy (accessed 2024-3-25)
(5) The Kennard-Stone Algorithm. NIRPY Research 2022. https://nirpyresearch.com/kennard-stone-algorithm/ (accessed 2024-3-25)
Deep Learning Advances Gas Quantification Analysis in Near-Infrared Dual-Comb Spectroscopy
May 15th 2024Researchers from Tsinghua University and Beihang University in Beijing have developed a deep-learning-based data processing framework that significantly improves the accuracy of dual-comb absorption spectroscopy (DCAS) in gas quantification analysis. By using a U-net model for etalon removal and a modified U-net combined with traditional methods for baseline extraction, their framework achieves high-fidelity absorbance spectra, even in challenging conditions with complex baselines and etalon effects.
New Near-Infrared Machine Learning Technique Identifies Dangerous Blood for Transfusion Safety
May 6th 2024Researchers in China have developed a cutting-edge machine learning approach that can detect chylous blood in blood intended for transfusion with more than 90% accuracy. This development promises to significantly reduce the risks associated with blood transfusions and improve the efficiency of blood donation centers.
New Probes for NIR Monitoring of Polymer Injection Molding Composition in Real-Time
May 2nd 2024Researchers from Kyoto University and Japan's National Institute of Advanced Industrial Science and Technology have developed innovative probes to monitor the chemical composition of biodegradable polymer blends during injection molding. This breakthrough could lead to improved production efficiency and reduced waste in the polymer industry.