The Marriage of Near-Infrared Spectroscopy with AI: The Small-Sample Breakthrough

January 27, 2025

News

Article

Scientists demonstrate a self-supervised learning framework that dramatically improves near-infrared spectroscopy classification results, even with minimal labeled data.

Artificial intelligence (AI) and NIR for Classification © putilov_denis - stock.adobe.com

Near-infrared (NIR) spectroscopy, a cornerstone in non-destructive analysis, has long been valued for its simplicity, speed, efficiency, and non-destructive analysis capabilities. Yet, its effectiveness often hinges on large databases, complex preprocessing, and skilled feature selection, often requiring significant domain expertise. Researchers at Fujian Agriculture and Forestry University have introduced a creative approach to overcome these limitations—a convolutional neural network (CNN)-based self-supervised learning (SSL) framework designed to excel even with small datasets. Published in Analytical Methods, the study promises to reshape spectral analysis by automating feature extraction and reducing reliance on labor-intensive data labeling (1–3).

Overcoming Challenges in NIR Spectroscopy

NIR spectroscopy commonly operates within the 780–2526 nm wavelength range, exploiting absorption patterns of hydrogen-containing groups like O–H, C–H, and N–H. Despite its versatility in analyzing organic molecules, challenges such as broad overlapping peaks, and low to noise signals often complicate direct data interpretation. Traditional machine learning (ML) methods, which rely heavily on preprocessing, feature selection, and model construction, risk signal distortion and information loss (1–3).

Deep learning has emerged as a promising alternative, but its reliance on large labeled datasets has limited its adoption in NIR spectroscopy, where labeling is costly and time-consuming. Addressing this gap, the Fujian team—comprising Rongyue Zhao, Wangsen Li, Jinchai Xu, Linjie Chen, Xuan Wei, and Xiangzeng Kong affiliated with the School of Future Technology and the College of Mechanical and Electrical Engineering—developed a novel SSL framework to extract critical spectral features with minimal human intervention (1).

The Self-Supervised Learning Framework

The proposed SSL model comprises two stages: pre-training and fine-tuning. During pre-training, the model utilizes pseudo-labeled data to learn intrinsic spectral features, setting initial parameters without requiring human-labeled samples. Fine-tuning then optimizes these parameters using a smaller set of labeled data. By leveraging this two-stage process, the model reduces the need for preprocessing while enhancing classification accuracy (1).

To validate the framework, the researchers applied it to their proprietary dataset of three tea tree varieties and three publicly available datasets—mango, tablet, and coal samples. Across all datasets, the model delivered remarkable results (1):

Tea Dataset: Achieved a classification accuracy of 99.12%.
Mango Dataset: Reached an accuracy of 97.83% for four mango varieties, utilizing NIR data collected from a Fourier transform near-infrared spectrometer.
Tablet Dataset: Attained 98.14% accuracy in categorizing pharmaceutical samples by active substance concentration.
Coal Dataset: Recorded an accuracy of 99.89%, demonstrating robustness across varied coal types and acquisition conditions.

Performance Insights

The framework’s transformative potential is evident in comparative experiments. When tested with only 5% of labeled data, the SSL model outperformed traditional ML methods by a substantial margin. Even as labeled data availability increased, the SSL approach maintained superior accuracy, displaying its efficiency and adaptability (1).

Additionally, ablation studies confirmed the critical role of the pre-training phase, which enhanced model performance by up to 10.41%. The researchers attribute this success to the model’s ability to extract both local and global spectral features during pre-training, ensuring consistent generalization across datasets (1).

Implications for Spectral Analysis

This study highlights SSL’s potential to address long-standing challenges in spectral analysis. By automating feature extraction and minimizing data-labeling requirements, the CNN-based SSL framework reduces dependency on domain expertise while improving model reliability. The implications extend beyond NIR spectroscopy, offering a blueprint for advancing small-sample analyses in diverse fields, from agriculture to pharmaceutical products and environmental monitoring (1).

“Our results demonstrate that SSL can significantly enhance spectral analysis, even under the constraints of limited data availability,” the authors concluded. “This framework not only advances the capabilities of NIR spectroscopy but also opens doors for broader applications of SSL in analytical science” (1).

The study’s authors, emphasize that this breakthrough sets the stage for more automated and scalable approaches to spectroscopy (1). By combining deep learning and self-supervised methodologies, the researchers have redefined what’s possible using NIR spectroscopy for classification, marking a pivotal step toward smarter, more efficient NIR analytical techniques (1).

Reference

(1) Zhao, R.; Li, W.; Xu, J.; Chen, L.; Wei, X.; Kong, X. A CNN-Based Self-Supervised Learning Framework for Small-Sample Near-Infrared Spectroscopy Classification. Anal. Methods. 2025, 13 Jan. DOI: 10.1039/D4AY01970A

(2) Yang, J.; Xu, J.; Zhang, X.; Wu, C.; Lin, T.; Ying, Y. Deep Learning for Vibrational Spectral Analysis: Recent Progress and a Practical Guide. Anal. Chim. Acta 2019, 1081, 6–17. DOI: 10.1016/j.aca.2019.06.012.

(3) Yang, L.; Sun, Q. Recognition of the Hardness of Licorice Seeds Using a Semi-Supervised Learning Method and Near-Infrared Spectral Data. Chemom. Intell. Lab. Syst. 2012, 114, 109–115. DOI: 10.1016/j.chemolab.2012.03.010.

Related Content

A futuristic image showcasing an IoT-enabled air sensor device monitoring environmental conditions © Ratchadaporn-chronicles-stock.adobe.com

Smarter Sensors, Cleaner Earth Using AI and IoT for Pollution Monitoring

Jerome Workman, Jr.

April 22nd 2025

Article

A global research team has detailed how smart sensors, artificial intelligence (AI), machine learning, and Internet of Things (IoT) technologies are transforming the detection and management of environmental pollutants. Their comprehensive review highlights how spectroscopy and sensor networks are now key tools in real-time pollution tracking.

Combining Spectroscopic and Chromatographic Techniques

August 1st 2013

Podcast

An interview with Charles Wilkins, the winner of the 2013 American Chemical Society Division of Analytical Chemistry Award in Chemical Instrumentation, sponsored by the Dow Chemical Company.

A rustic frame of diverse grains, cereals, and ears of corn on a neutral gray background. Generated by AI. | Image Credit: © chanwut - stock.adobe.com

New AI Strategy for Mycotoxin Detection in Cereal Grains

Will Wetzel

April 21st 2025

Article

Researchers from Jiangsu University and Zhejiang University of Water Resources and Electric Power have developed a transfer learning approach that significantly enhances the accuracy and adaptability of NIR spectroscopy models for detecting mycotoxins in cereals.

Karl Norris: A Pioneer in Optical Measurements and Near-Infrared Spectroscopy, Part II

Jerome Workman, Jr.

April 21st 2025

Article

In this two-part "Icons of Spectroscopy" column, executive editor Jerome Workman Jr. details how Karl H. Norris has impacted the analysis of food, agricultural products, and pharmaceuticals over six decades. His pioneering work in optical analysis methods including his development and refinement of near-infrared spectroscopy, has transformed analysis technology. In this Part II article of a two-part series, we summarize Norris’ foundational publications in NIR, his patents, achievements, and legacy.

Сhild has stomatitis in mouth. Diseases of oral cavity. Candidiasis in boy. | Image Credit: © Полина Екимова - stock.adobe.com

Portable Near-infrared Spectroscopy for the Non-destructive Quality Control of Polaprezinc (PLZ) Mucoadhesive Films

Will Wetzel

April 15th 2025

Article

Researchers from Gifu Pharmaceutical University and Gifu University Hospital unveil a novel polaprezinc (PLZ) mucoadhesive film designed to replace painful lozenges for cancer patients.

Karl Norris: A Pioneer in Optical Measurements and Near-Infrared Spectroscopy, Part I

Jerome Workman, Jr.

April 15th 2025

Article

In this "Icons of Spectroscopy" column, executive editor Jerome Workman Jr. details how Karl H. Norris has impacted the analysis of food, agricultural products, and pharmaceuticals over six decades. His pioneering work in optical analysis methods including his development and refinement of near-infrared (NIR) spectroscopy has transformed analysis technology. This Part I article of a two-part series introduces Norris’ contributions to NIR.