Deep Learning-Based Prediction Model Developed for Nonclassical Proteins

May 23, 2024

News

Article

Scientists from Henan University in Kaifeng, China created a new prediction model for classifying nonclassical secreted proteins based on deep learning, publishing their findings in the Journal of Chemometrics (1). The scientists involved in the study include Xinhong Zhang and Yiru He from the Henan University School of Software, Binjie Wang and Fan Zhang from the university’s Radiological Department, and Chaoyang Liu of the Henan University School of Computer and Information Engineering.

AI Artificial intelligence digital brain big data deep learning computer machine - illustration rendering | Image Credit: © immimagery - stock.adobe.com

Deep learning is a subset of machine learning that uses multi-layered neural networks, otherwise called deep neural networks (DNNs), to simulate the human brain’s complex decision-making power (2). Unlike machine learning, which needs to pre-process and organize unstructured data, deep learning eliminates some of the pre-processing that is usually involved with machine learning. While DNNs must have at minimum three or more layers, most DNNs have many more layers. DNNs are trained on large amounts of data to identify and classify phenomena, recognize patterns and relationships, evaluate possibilities, and make predictions and decisions. These additional layers can help refine and optimize predictions and decisions for greater accuracy. Deep learning can drive many applications and services that improve automation, performing analytical and physical tasks without human intervention and enabling everyday products and services. and enabling everyday products and services.

In this study, the scientists proposed an end-to-end nonclassical secreted protein prediction model based on deep learning, named DeepNCSPP. This model employs protein sequence information and sequence statistics information as input to predict whether it is a nonclassical secreted protein. Protein sequence information is extracted using bidirectional long- and short-term memory, while sequence statistics information is extracted using convolutional neural networks.

Accurate protein structure predictions, which have been enabled by advancements in machine learning algorithms, allow for entry points into probing structural mechanisms and integrating and querying different types of biochemical and biophysical results (3). Many nonclassical protein prediction methods that are used today involve manual feature selection. This type of process involves constructing sample features based on the physicochemical properties of proteins and position-specific scoring matrix (PSSM). However, these tasks can require researchers to perform tedious search work to obtain the physicochemical properties of proteins.

Among the experiments conducted on the independent test data set, DeepNCSPP achieved excellent results, achieving an accuracy of 88.24%, a Matthews coefficient (MCC) of 77.01%, and an F1-score of 87.50%. Along with independent test data set testing and 10-fold cross-validation, these findings show that DeepNCSPP can achieve competitive performance with state-of-the art methods, while also being used as a reliable nonclassical secreted protein prediction model. A web server has been created for researchers’ convenience, with the link being found here: https://www.deepncspp.top/.

Artificial intelligence (AI) and machine learning are becoming more common in analytical science. For example, researchers from Tsinghua University and Beihang University in Beijing developed a deep-learning-based data processing framework that significantly improves the accuracy of dual-comb absorption spectroscopy (DCAS) in gas quantification analysis, Spectroscopy previously reported (4). By using a U-net model for etalon removal and a modified U-net combined with traditional methods for baseline extraction, their framework achieves high-fidelity absorbance spectra, even in challenging conditions with complex baselines and etalon effects. U-Net combined with adaptive iteratively reweighted penalized least squares (airPLS) is a hybrid approach for processing complex spectra data, tackling issues like etalon effects and complex baselines.

References

(1) Zhang, F.; Liu, C.; Wang, B.; et al. A Prediction Model of Nonclassical Secreted Protein Based on Deep Learning. J. Chemom. 2024, e3553. DOI: 10.1002/cem.3553

(2) What is Deep Learning? IBM 2024. https://www.ibm.com/topics/deep-learning (accessed 2024-5-22)

(3) Protein Structure Prediction. https://www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/protein-structure-prediction (accessed 2024-5-22)

(4) Workman Jr, J. Deep Learning Advances Gas Quantification Analysis in Near-Infrared Dual-Comb Spectroscopy. Spectroscopy 2024. https://www.spectroscopyonline.com/view/deep-learning-advances-gas-quantification-analysis-in-near-infrared-dual-comb-spectroscopy (accessed 2024-5-23)

Related Content

Depiction of modern satellite spectral imaging system © hassan-chronicles-stock.adobe.com

Introduction to Satellite and Aerial Spectral Imaging Systems

Jerome Workman, Jr.

April 28th 2025

Article

Modern remote sensing technologies have evolved from coarse-resolution multispectral sensors like MODIS and MERIS to high-resolution, multi-band systems such as Sentinel-2 MSI, Landsat OLI, and UAV-mounted spectrometers. These advancements provide greater spectral and spatial detail, enabling precise monitoring of environmental, agricultural, and land-use dynamics.

A collection of colorful capsules and tablets scattered on a surface. Generated by AI. | Image Credit: © Khatyjay - stock.adobe.com

New Tutorial Highlights Power of Chemometrics in Data Analysis

Will Wetzel

April 28th 2025

Article

A new tutorial provides a step-by-step, hands-on guide to using multivariate data analysis tools like PCA and PLS to extract meaningful insights from complex pharmaceutical data sets.

Best of the Week: AI and IoT for Pollution Monitoring, High Speed Laser MS

Will Wetzel

April 25th 2025

Article

Top articles published this week include a preview of our upcoming content series for National Space Day, a news story about air quality monitoring, and an announcement from Metrohm about their new Midwest office.

Healthy oil from sunflower, olive, rapeseed oil. Cooking oils in bottle. | Image Credit: © Sebastian Duda - stock.adobe.com

Conducting Smarter Vegetable Oil Analysis Using NMR and Chemometrics

Will Wetzel

April 24th 2025

Article

A welder in protective gear fuses aluminum pieces with precision, © 69-chronicles-stock.adobe.com

LIBS Illuminates the Hidden Health Risks of Indoor Welding and Soldering

Jerome Workman, Jr.

April 23rd 2025

Article

A new dual-spectroscopy approach reveals real-time pollution threats in indoor workspaces. Chinese researchers have pioneered the use of laser-induced breakdown spectroscopy (LIBS) and aerosol mass spectrometry to uncover and monitor harmful heavy metal and dust emissions from soldering and welding in real-time. These complementary tools offer a fast, accurate means to evaluate air quality threats in industrial and indoor environments—where people spend most of their time.

A futuristic image showcasing an IoT-enabled air sensor device monitoring environmental conditions © Ratchadaporn-chronicles-stock.adobe.com

Smarter Sensors, Cleaner Earth Using AI and IoT for Pollution Monitoring

Jerome Workman, Jr.

April 22nd 2025

Article

A global research team has detailed how smart sensors, artificial intelligence (AI), machine learning, and Internet of Things (IoT) technologies are transforming the detection and management of environmental pollutants. Their comprehensive review highlights how spectroscopy and sensor networks are now key tools in real-time pollution tracking.