Distinguishing Horsetails Using NIR and Predictive Modeling

News
Article

Spectroscopy sat down with Knut Baumann of the University of Technology Braunschweig to discuss his latest research examining the classification of two closely related horsetail species, Equisetum arvense (field horsetail) and Equisetum palustre (marsh horsetail), using near-infrared spectroscopy (NIR).

Horsetails are plants that belong to the genus Equisetum, which is the only genus in the family Equisetaceae. There are 15 species of this plant found worldwide (1). What makes horsetails unique is that they are living fossils that date back to the Carboniferous period, which was approximately 325 million years ago (1). As a result, ancient horsetails have been linked to the formation of coal deposits around the world (1). However, determining the species of horsetails is often difficult because of morphological similarities between some of them (2).

Knut Baumann is with the Institute of Medicinal and Pharmaceutical Chemistry at the University of Technology Braunschweig, in Braunschweig, Germany. He recently explored how to differentiate between various horsetail species using near-infrared (NIR) spectroscopy (2). Spectroscopy recently sat down with Baumann to discuss his latest research, and what it means for species classification and medicinal research.

Knut Baumann of the University of Technology Braunschweig. Photo Credit: © Knut Baumann

Knut Baumann of the University of Technology Braunschweig. Photo Credit: © Knut Baumann

Could you elaborate on the challenges of distinguishing Equisetum arvense from Equisetum palustre in the field?

There are two main difficulties in distinguishing between the two species E. arvense and E. palustre. First, both species share most natural habitats. Second, they are morphologically very similar: similar size, stem shape, and color. The lowest branch limb serves as a distinguishing feature. In E. palustre it is shorter than the neighboring stem sheath; in E. arvense, it is at least the same length. Experienced botanists are needed to reliably determine this difference in the field.

What advantages do near-infrared spectroscopy (NIR) methods offer compared to traditional identification techniques for horsetail species?

So far, the identification of E. palustre requires either experienced specialized botanists or elaborate analytical techniques such as high performance liquid chromatography–tandem mass spectrometry (HPLC-MS/MS) or DNA-barcoding to confirm species identity. These methods require specialized staff and the appropriate equipment to carry out the analyses. They are time-consuming, require resources such as chemicals, and they are destructive to the samples.

Analyses using NIR spectroscopy, on the other hand, do not require specialist staff with extensive training. NIR analysis takes considerably less time compared to the above-mentioned analyses and is non-destructive to the samples. In addition, the portable device used in this study was quite inexpensive.

How do the performances of the portable NIR device and the benchtop NIR device compare in terms of accuracy and practicality?

Both devices provide similar accuracy (85–90%), whereas the accuracies achieved with the benchtop device are slightly higher (1–5%) compared to the accuracies obtained with the spectra of the portable device. This difference in accuracy is presumably caused by the higher resolution of the benchtop device. If the aspect of handling is included in the comparison, the portable device performs better than the desktop device. Fewer instrument parameters need to be set, and it is much easier to use.

Could you discuss the role of unsupervised machine learning (ML) techniques like principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) in visualizing the spectral data?

PCA and t-SNE (perplexity was set to 10 here) were used for exploratory data analysis (EDA). Both techniques solely use the spectral data and not the class labels (that is, they are unsupervised techniques). The obtained two-dimensional (2D) projections of the originally high-dimensional data summarize the main characteristics of the data set. They provide a visual impression of how well the (preprocessed) spectra are suited to distinguish the two species. Therefore, PCA and t-SNE are mainly used for an initial assessment of whether the spectroscopic data is suitable for further classification.

What insights did the supervised ML models provide regarding the spectral differences between the two species?

We used three state-of-the-art algorithms in the study. They were support vector machine (SVM; kernel learning), k-nearest neighbor classification (kNN; distance-based), and random forests (RF; ensemble learning). Each method provides a confidence measure that can be converted to a class membership probability. The latter gives an impression of the uncertainty of test data predictions. In addition to that, RF also provides a measure of importance for each input variable (in this study, it was spectral wavelength). Yet, this feature was not used in this study.

What does the repeated cross-validation approach reveal about the robustness of your classification models?

The repeated cross-validation (CV) was used for model assessment only. We used a 100 x 4 repeated stratified CV to achieve a reasonable estimation of the actual predictive performance using 400 models overall. The more models that are computed, the lower the variability of the performance measures are. Because the differences of the obtained accuracies were small here, a low variability of the respective estimates is essential. With a single four-fold cross-validation run (or few repetitions), the resulting estimate of the accuracy heavily depends on the actual cross-validation partitions. Repeated CV with many repetitions smooths this out. The variability of the accuracy (or any other performance measure) obtained from each single CV run could be used as an indicator for the robustness of the models (large variation would indicate little stability). This was not studied here as the exploratory data analysis did not indicate the presence of outliers.

Do you know which NIR spectral regions were most important for classification of the horsetail species, and do you know why?

The main reason for analyzing the two species is the higher alkaloid content of E. palustre. However, the trace components palustrine and palustridiene cannot be detected in the spectra due to their low content. Instead, the plant material is characterized by the spectrum. By applying variable selection, only the most relevant wavelengths were used for model building. Only wavelengths well suited for differentiating between the two species were included in the model. These variables are clustered in the range around 1700 nm (first overtone C-H stretching) and 2300 nm (combination C-H stretching).

Beyond species classification, what potential applications do you see for this NIR-based workflow in other areas of plant or medicinal research?

NIR spectroscopy is an increasingly popular method with many possible applications. It is fast, inexpensive, easy to use and non-destructive, so there is great interest in the exploration of other potential areas of application. With medicinal plants, it could be used not only to identify species, but also to determine the time of harvest and origin of samples or to monitor ageing processes or active ingredient levels.

References

  1. Iowa State University, Equisetum: Biology and Management. Iowa State. Available at: https://crops.extension.iastate.edu/encyclopedia/equisetum-biology-and-management#:~:text=Horsetails%20are%20members%20of%20the,(325%20million%20years%20ago). (accessed 2025-01-27).
  2. Beier, K.; Dutschmann, T.-M.; Beuerle, T.; et al. Classification of Horsetails Using Predictive Modelling on NIR Spectra. J. Chemom. 2024, e3634. DOI: 10.1002/cem.3634
Recent Videos
John Burgener | Photo Credit: © Will Wetzel
Robert Jones speaks to Spectroscopy about his work at the CDC. | Photo Credit: © Will Wetzel
John Burgener | Photo Credit: © Will Wetzel
Robert Jones speaks to Spectroscopy about his work at the CDC. | Photo Credit: © Will Wetzel
John Burgener of Burgener Research Inc.
Related Content