Scientists from the University of Granada (Spain) recently compared how effective hyperspectral imaging (HSI) and machine learning (ML) methods are in classifying ink found in historical documents. Their findings were published in Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy (1).
Feather and ink bottle isolated on paper background | Image Credit: © Sergey Yarochkin - stock.adobe.com
Identifying materials used in tangible cultural heritage is vital for selecting appropriate restoration and preservation strategies. Analyzing inks in manuscripts and historical documents can enrich one’s understanding of artistic and historical context, bettering efforts to date documents, determine authorship, detecting falsifications or undocumented restorations, and identifying causes of deterioration. Ink analysis, therefore, is key for codicologists and historians looking to explore the content and material composition of manuscripts.
To obtain compositional information while preserving objects’ integrity and value, non-invasive analytical techniques are predominantly used, the most widely utilized being X-ray fluorescence (XRF), X-ray diffraction (XRD), Fourier transform infrared (FTIR) spectroscopy, and Raman spectroscopy. Recently, however, hyperspectral imaging (HSI) has gained prominence in this field. Combining spectroscopy and spatial imaging, this technique provides images at different wavelengths, capturing spectral reflectance at each pixel of an image, creating a hypercube containing three-dimensional data (two spatial coordinates and a spectrum for every pixel of the image). According to the researchers, HSI’s primary advantage over other methods is its ability to provide spatial information, enabling the retrieval of material distribution within a document, which is critical for historical studies and conservation evaluation (2). Additionally, its non-contact and rapid data acquisition capabilities make it suitable for on-site analysis of historical artifacts at locations like museums or libraries.
While HSI has its advantages, the researchers claim that no studies have investigated the automatic classification of historical inks by using machine learning (ML) and HSI data. For this study, six supervised ML models were trained and validated to automatically classify three types of inks: (1) pure metallo-gallate inks (MGP); (2) carbon-containing inks (CC), which include pure carbon-based inks like ivory black or bone black, as well as mixtures of carbon-based and metallo-gallate or sepia inks; and (3) non-carbon-containing inks (NCC), which can be pure sepia or a mixture of MGP and sepia. Six supervised classification models, including five traditional algorithms (Support Vector Machines [SVM], K-Nearest Neighbors [KNN], Linear Discriminant Analysis [LDA], Random Forest [RF], and Partial Least Squares Discriminant Analysis [PLS-DA]) and one deep learning (DL)-based model, were evaluated. Further, principal component analysis (PCA) was used before classification for visualization of the separability of the classes and dimensionality reduction, comparing the classification accuracy and running time with and without PCA.
With mock-up samples and historical documents, micro-averaged accuracy above 90%was achieved for all models. The best results came from the DL model, with micro- and macro-averaged accuracy and recall reaching above the 99%threshold. Among traditional models, SVM was the best option with all metrics above the 95% threshold and micro- and macro-averaged accuracy and recall above 97%. That said, neither model achieved perfect results. As such, choosing between a traditional or DL model can mostly be based on available computational resources and how dire the need is for slightly better accuracy.
Future research will be focused on tackling more detailed classification where subclasses in CC and NCC groups can be separated. Applying unmixing techniques could prove more interpretable analyses of individual components and their concentrations in mixtures compared to DL or ML approaches. Their effectiveness, however, will depend on the choice of mixing model, the accuracy of the extracted endmembers (spectra of pure components), and the availability of a comprehensive reference library.
(1) López-Baldomero, A. B.; Buzzelli, M.; Moronta-Montero, F.; Martínez-Domingo, M. Á.; Valero, E. M. Ink Classification in Historical Documents Using Hyperspectral Imaging and Machine Learning Methods. Spectrochim. Acta – A: Mol. Biomol. Spectrosc. 2025, 335, 125916. DOI: 10.1016/j.saa.2025.125916
(2) Catelli, E.; Randeberg, L. L.; Alsberg, B. K.; Gebremariam, K. F.; Bracci, S. An Explorative Chemometric Approach Applied to Hyperspectral Images for the Study of Illuminated Manuscripts. Spectrochim. Acta – A: Mol. Biomol. Spectrosc. 2017, 177, 69–78. DOI: 10.1016/j.saa.2017.01.015
Best of the Week: AI and IoT for Pollution Monitoring, High Speed Laser MS
April 25th 2025Top articles published this week include a preview of our upcoming content series for National Space Day, a news story about air quality monitoring, and an announcement from Metrohm about their new Midwest office.
LIBS Illuminates the Hidden Health Risks of Indoor Welding and Soldering
April 23rd 2025A new dual-spectroscopy approach reveals real-time pollution threats in indoor workspaces. Chinese researchers have pioneered the use of laser-induced breakdown spectroscopy (LIBS) and aerosol mass spectrometry to uncover and monitor harmful heavy metal and dust emissions from soldering and welding in real-time. These complementary tools offer a fast, accurate means to evaluate air quality threats in industrial and indoor environments—where people spend most of their time.
New AI Strategy for Mycotoxin Detection in Cereal Grains
April 21st 2025Researchers from Jiangsu University and Zhejiang University of Water Resources and Electric Power have developed a transfer learning approach that significantly enhances the accuracy and adaptability of NIR spectroscopy models for detecting mycotoxins in cereals.