Sample Selection Methods Tested Against Near-Infrared Spectral Information Entropy

March 25, 2024

News

Article

Scientists from East China Jiaotong University, located in Nangchang, Jiangxi, China, recently tested different sample selection methods using near-infrared (NIR) spectral information entropy as a similarity criterion. Their findings were published in the Journal of Chemometrics (1).

Young woman examines a spectroscopy picture in a quantum physics laboratory | Image Credit: © luchschenF - stock.adobe.com

Near-infrared (NIR) spectroscopy has been used in a wide variety of tasks in the past few years. The technique has been used to predict the harvest times of cabernet sauvignon grapes, detect Covid-19, and analyze emission lines from a supernova (in tandem with mid-infrared [MIR] spectroscopy (2–4). When using NIR, model constructions and maintenance updates are essential. Model construction, when being performed in machine learning, usually has a sample set divided into a calibration set and a validation set. The representativeness of the calibration set, and the reasonable distribution of the validation set affect the accuracy of the established model. Additionally, while maintaining and updating models, selecting the most informative updated samples can not only improve the model prediction accuracy, but also reduce the amount of sample preparation that is necessary.

For this study, spectral information entropy (SIE) is proposed as a similarity criterion for dividing sample sets, with this criterion being used to select updated samples. Two methods were used for comparing and verifying the superiority of this proposed method: the Kennard–Stone (KS) method, which is a way to perform a split between training and test set based on a distance metric between data points, spectra or labels, and the sample set portioning based on joint x–y distance (SPXY) method (5).

The model that was built after dividing the sample set with SIE was shown to have a good prediction effect compared to the sample sets that were divided with KS and SPXY. When predicting soluble solid content (SSC) and hardness, the prediction determination coefficient (R2P) was improved by over 15%, while the root mean square error (RMSE) of prediction was reduced by 50%. Regarding model updating, it was found that selecting a small number of updated samples using SIE can improve a correlation efficient (RP) by more than 80%, with updated models having prediction accuracies higher than those of the KS and SPXY methods. These results confirm that SIE can make the NIR analysis technique more reliable.

References

(1) Liu, Y.; He, C.; Jiang, X. Sample Selection Method Using Near-Infrared Spectral Information Entropy as Similarity Criterion for Constructing and Updating Peach Firmness and Soluble Solids Content Prediction Models. J. Chemom. 2023, 38 (2), e3528. DOI: https://doi.org/10.1002/cem.3528

(2) Luo, Y.; Zhao, J.; Zhu, H.; Li, X.; Dong, J.; Sun, J. Prediction of the Harvest Time of Caberney Sauvignon Grapes Using Near-Infrared Spectroscopy. Spectroscopy 2024. https://www.spectroscopyonline.com/view/prediction-of-the-harvest-time-of-cabernet-sauvignon-grapes-using-near-infrared-spectroscopy (accessed 2024-3-25)

(3) Acevedo, A. Detecting Covid-19 Using Visible or Near-Infrared Spectroscopy and Machine Learning. Spectroscopy 2023. https://www.spectroscopyonline.com/view/detecting-covid-19-using-visible-or-near-infrared-spectroscopy-and-machine-learning (accessed 2024-3-25)

(4) Wetzel, W. Observing Supernova 1987A with Near-infrared and Mid-infrared Spectroscopy. Spectroscopy 2024. https://www.spectroscopyonline.com/view/observing-supernova-1987a-with-near-infrared-and-mid-infrared-spectroscopy (accessed 2024-3-25)

(5) The Kennard-Stone Algorithm. NIRPY Research 2022. https://nirpyresearch.com/kennard-stone-algorithm/ (accessed 2024-3-25)

Related Content

Crushed plastic bottles heap | Image Credit: © Bits and Splits - stock.adobe.com

New Hybrid Spectroscopy Method Boosts Accuracy in Plastic Waste Sorting

Will Wetzel

May 27th 2025

Article

Researchers from Tohoku University, Shibaura Institute of Technology, and Shizuoka University unveil advanced sorting system using NIR, THz, and machine learning for improved recycling outcomes.

Combining Spectroscopic and Chromatographic Techniques

August 1st 2013

Podcast

An interview with Charles Wilkins, the winner of the 2013 American Chemical Society Division of Analytical Chemistry Award in Chemical Instrumentation, sponsored by the Dow Chemical Company.

Row of rolls of aluminum lie in production shop of plant. | Image Credit: © Pavel Losevsky - stock.adobe.com

China Institutions Team Up to Oxidize Toluene at Lower Temperatures

Will Wetzel

May 21st 2025

Article

Researchers from several Chinese universities have developed a low-cost, red mud-based catalyst doped with manganese oxides that efficiently oxidizes toluene at lower temperatures, offering a sustainable solution for air pollution control and industrial waste reuse.

Whey protein scoop. Sports nutrition. | Image Credit: © Nick Starichenko - stock.adobe.com

Whey Protein Fraud: How Portable NIR Spectroscopy and AI Can Combat This Issue

Will Wetzel

May 20th 2025

Article

Researchers from Tsinghua and Hainan Universities have developed a portable, non-destructive method using NIR spectroscopy, hyperspectral imaging, and machine learning to accurately assess the quality and detect adulteration in whey protein supplements.

Oil bubbles tinted with fluorescent dyes, glowing brightly under ultraviolet light, creating a surreal and psychedelic visual experience. Generated with AI. | Image Credit: © MAY - stock.adobe.com.

New Low-Cost Fluorescent Probe Offers Rapid Detection of Toxic Hydrazine in the Environment and Food

Will Wetzel

May 15th 2025

Article

Researchers from Jiangnan University introduced a sensitive, selective, and highly adaptable new probe for detecting hydrazine.

Close-up side shot of microplastics lay on people hand. Concept of water pollution and global warming. Climate change idea. Microplastics concept in food and water or sea. | Image Credit: © Deemerwha studio - stock.adobe.com

Accurate Plastic Blend Analysis Using Mid-Infrared Spectroscopy

Will Wetzel

May 15th 2025

Article

Researchers at the Sinopec Research Institute have developed a novel method using virtually generated mid-infrared spectra to accurately quantify plastic blends, offering a faster, scalable solution for recycling and environmental monitoring.