Search CORE

1,720,998 research outputs found

How well does a data-driven prediction method distinguish dihydrouridine from tRNA and mRNA?

Author: Basith S
Manavalan B
Publication venue
Publication date: 01/01/2023
Field of study

Amyotrophic lateral sclerosis disease-related mutations disrupt the dimerization of superoxide dismutase 1 - A comparative molecular dynamics simulation study

Author: Basith S
Manavalan B
Lee G
Publication venue
Publication date: 01/01/2022
Field of study

More than 150 genes are involved in amyotrophic lateral sclerosis (ALS), with superoxide dismutase 1 (SOD1) being one of the most studied. Mutations in SOD1 gene, which encodes the enzyme SOD1 is the second most prevalent and studied cause of familial ALS. SOD1 is a ubiquitous, homodimeric metalloenzyme that forms a critical component of the cellular defense against reactive oxygen species. Several mutations in the SOD1 enzyme cause misfolding, dimerization instability, and increased aggregate formation in ALS. However, there is a lack of information on the dimerization of SOD1 monomers and the mechanistic underpinnings on how the pathogenic mutations disrupt the dimerization mechanism. Here, we presented microsecond-scale molecular dynamics (MD) simulations to unravel how interface-based mutations compromise SOD1 dimerization and provide mechanistic understanding into the corresponding process using WT and three interface-based mutant systems (A4V, T54R, and I113T). Structural stability analysis showed that the mutant systems displayed disparate variations in the catalytic sites which may directly alter the stability and activity of the SOD1 enzyme. Based on the dynamic network analysis and principal component analysis, it has been identified that the mutations weakened the correlated motions along the dimer interface and altered the protein conformational behavior, thus weakening the stability of dimer formation. Moreover, the simulation results identified crucial residues such as G51, D52, G114, I151, and Q153 in establishing the dimerization interaction network, which were weakened or absent in the presence of interfacial mutants. Surface potential analysis on mutant systems also displayed changes in the dimerization potential, thus showing the unfavorable dimer formation. Furthermore, network analysis identified the hotspot residues necessary for SOD1 signal transduction which were surprisingly found in the catalytic sites rather than the anticipated dimerization interface

Ajou Open Repository

Unveiling local and global conformational changes and allosteric communications in SOD1 systems using molecular dynamics simulation and network analyses

Author: Basith S
Manavalan B
Lee G
Publication venue
Publication date: 01/01/2024
Field of study

Background: Amyotrophic lateral sclerosis (ALS) is a serious neurodegenerative disorder affecting nerve cells in the brain and spinal cord that is caused by mutations in the superoxide dismutase 1 (SOD1) enzyme. ALS-related mutations cause misfolding, dimerisation instability, and increased formation of aggregates. The underlying allosteric mechanisms, however, remain obscure as far as details of their fundamental atomistic structure are concerned. Hence, this gap in knowledge limits the development of novel SOD1 inhibitors and the understanding of how disease-associated mutations in distal sites affect enzyme activity. Methods: We combined microsecond-scale based unbiased molecular dynamics (MD) simulation with network analysis to elucidate the local and global conformational changes and allosteric communications in SOD1 Apo (unmetallated form), Holo, Apo_CallA (mutant and unmetallated form), and Holo_CallA (mutant form) systems. To identify hotspot residues involved in SOD1 signalling and allosteric communications, we performed network centrality, community network, and path analyses. Results: Structural analyses showed that unmetallated SOD1 systems and cysteine mutations displayed large structural variations in the catalytic sites, affecting structural stability. Inter- and intra H-bond analyses identified several important residues crucial for maintaining interfacial stability, structural stability, and enzyme catalysis. Dynamic motion analysis demonstrated more balanced atomic displacement and highly correlated motions in the Holo system. The rationale for structural disparity observed in the disulfide bond formation and R143 configuration in Apo and Holo systems were elucidated using distance and dihedral probability distribution analyses. Conclusion: Our study highlights the efficiency of combining extensive MD simulations with network analyses to unravel the features of protein allostery

Ajou Open Repository

AntiT2DMP-Pred: Leveraging feature fusion and optimization for superior machine learning prediction of type 2 diabetes mellitus

Author: Basith S
Manavalan B
Lee G
Publication venue
Publication date: 01/01/2025
Field of study

Pancreatic α-amylase breaks down starch into isomaltose and maltose, which are further hydrolyzed by α-glucosidase in the intestine into monosaccharides, rapidly raising blood sugar levels and contributing to type 2 diabetes mellitus (T2DM). Synthetic inhibitors of carbohydrate-digesting enzymes are used to manage T2DM but may harm organ function over time. Bioactive peptides offer a safer alternative, avoiding such adverse effects. Computational methods for predicting antidiabetic peptides (ADPs) can significantly reduce the time and cost of experimental testing. While machine learning (ML) has been applied to identify ADPs, advancements in data analysis and algorithms continue to drive progress in the field. To address this, we developed AntiT2DMP-Pred, the first ML-based tool specifically designed for predicting type 2 antidiabetic peptides (T2ADPs). This tool employs a feature fusion strategy, combining ten highly discriminative feature descriptors chosen from a pool of 32 descriptors and eight ML algorithms, tested across a range of baseline models. AntiT2DMP-Pred demonstrated excellent performance, surpassing both baseline and feature-optimized models, with an accuracy (ACC) and Matthews’ correlation coefficient (MCC) of 0.976 and 0.953 on the training dataset, and an ACC and MCC of 0.957 and 0.851 on the independent dataset. The web server (https://balalab-skku.org/AntiT2DMP-Pred) is freely accessible, enabling researchers worldwide to utilize it in their experimental workflows and contribute to the discovery and understanding of T2ADPs, ultimately supporting peptide-based therapeutic development for diabetes management

Ajou Open Repository

STALLION: A stacking-based ensemble learning framework for prokaryotic lysine acetylation site prediction

Author: Basith S
Manavalan B
Lee G
Publication venue
Publication date: 01/01/2022
Field of study

Protein post-translational modification (PTM) is an important regulatory mechanism that plays a key role in both normal and disease states. Acetylation on lysine residues is one of the most potent PTMs owing to its critical role in cellular metabolism and regulatory processes. Identifying protein lysine acetylation (Kace) sites is a challenging task in bioinformatics. To date, several machine learning-based methods for the in silico identification of Kace sites have been developed. Of those, a few are prokaryotic species-specific. Despite their attractive advantages and performances, these methods have certain limitations. Therefore, this study proposes a novel predictor STALLION (STacking-based Predictor for ProkAryotic Lysine AcetyLatION), containing six prokaryotic species-specific models to identify Kace sites accurately. To extract crucial patterns around Kace sites, we employed 11 different encodings representing three different characteristics. Subsequently, a systematic and rigorous feature selection approach was employed to identify the optimal feature set independently for five tree-based ensemble algorithms and built their respective baseline model for each species. Finally, the predicted values from baseline models were utilized and trained with an appropriate classifier using the stacking strategy to develop STALLION. Comparative benchmarking experiments showed that STALLION significantly outperformed existing predictor on independent tests. To expedite direct accessibility to the STALLION models, a user-friendly online predictor was implemented, which is available at: http://thegleelab.org/STALLION

Ajou Open Repository

SEP-AlgPro: An efficient allergen prediction tool utilizing traditional machine learning and deep learning techniques with protein language model features

Author: Basith S
Manavalan B
Lee G
Pham NT
Publication venue
Publication date: 01/01/2024
Field of study

Allergy is a hypersensitive condition in which individuals develop objective symptoms when exposed to harmless substances at a dose that would cause no harm to a “normal” person. Most current computational methods for allergen identification rely on homology or conventional machine learning using limited set of feature descriptors or validation on specific datasets, making them inefficient and inaccurate. Here, we propose SEP-AlgPro for the accurate identification of allergen protein from sequence information. We analyzed 10 conventional protein-based features and 14 different features derived from protein language models to gauge their effectiveness in differentiating allergens from non-allergens using 15 different classifiers. However, the final optimized model employs top 10 feature descriptors with top seven machine learning classifiers. Results show that the features derived from protein language models exhibit superior discriminative capabilities compared to traditional feature sets. This enabled us to select the most discriminatory baseline models, whose predicted outputs were aggregated and used as input to a deep neural network for the final allergen prediction. Extensive case studies showed that SEP-AlgPro outperforms state-of-the-art predictors in accurately identifying allergens. A user-friendly web server was developed and made freely available at https://balalab-skku.org/SEP-AlgPro/, making it a powerful tool for identifying potential allergens

Ajou Open Repository

Comparative analysis of machine learning-based approaches for identifying therapeutic peptides targeting SARS-CoV-2

Author: Basith S
Manavalan B
Lee G
Publication venue
Publication date: 01/01/2022
Field of study

Coronavirus disease 2019 (COVID-19) has impacted public health as well as societal and economic well-being. In the last two decades, various prediction algorithms and tools have been developed for predicting antiviral peptides (AVPs). The current COVID-19 pandemic has underscored the need to develop more efficient and accurate machine learning (ML)-based prediction algorithms for the rapid identification of therapeutic peptides against severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2). Several peptide-based ML approaches, including anti-coronavirus peptides (ACVPs), IL-6 inducing epitopes and other epitopes targeting SARS-CoV-2, have been implemented in COVID-19 therapeutics. Owing to the growing interest in the COVID-19 field, it is crucial to systematically compare the existing ML algorithms based on their performances. Accordingly, we comprehensively evaluated the state-of-the-art IL-6 and AVP predictors against coronaviruses in terms of core algorithms, feature encoding schemes, performance evaluation metrics and software usability. A comprehensive performance assessment was then conducted to evaluate the robustness and scalability of the existing predictors using well-constructed independent validation datasets. Additionally, we discussed the advantages and disadvantages of the existing methods, providing useful insights into the development of novel computational tools for characterizing and identifying epitopes or ACVPs. The insights gained from this review are anticipated to provide critical guidance to the scientific community in the rapid design and development of accurate and efficient next-generation in silico tools against SARS-CoV-2

Ajou Open Repository

THRONE: A New Approach for Accurate Prediction of Human RNA N7-Methylguanosine Sites

Author: Pitti T
Basith S
Manavalan B
Lee G
Shoombuatong W
Publication venue
Publication date: 01/01/2022
Field of study

N(7)-methylguanosine (m7G) is an essential, ubiquitous, and positively charged modification at the 5' cap of eukaryotic mRNA, modulating its export, translation, and splicing processes. Although several machine learning (ML)-based computational predictors for m7G have been developed, all utilized specific computational framework. This study is the first instance we explored four different computational frameworks and identified the best approach. Based on that we developed a novel predictor, THRONE (A three-layer ensemble predictor for identifying human RNA N7-methylguanosine sites) to accurately identify m7G sites from the human genome. THRONE employs a wide range of sequence-based features inputted to several ML classifiers and combines these models through ensemble learning. The three-step ensemble learning is as follows: 54 baseline models were constructed in the first layer and the predicted probability of m7G was considered as a new feature vector for the sequential step. Subsequently, six meta-models were created using the new feature vector and their predicted probability was yet again considered as novel features. Finally, random forest was deemed as the best super classifier learner for the final prediction using a systematic approach incorporated with novel features. Interestingly, THRONE outperformed other existing methods in the prediction of m7G sites on both cross-validation analysis and independent evaluation. The proposed method is publicly accessible at: http://thegleelab.org/THRONE/ and expects to help the scientific community identify the putative m7G sites and formulate a novel testable biological hypothesis

Ajou Open Repository

mHPpred: Accurate identification of peptide hormones using multi-view feature learning

Author: Basith S
Manavalan B
Lee G
Sangaraju VK
Publication venue
Publication date: 01/01/2024
Field of study

Peptide hormones were first used in medicine in the early 20th century, with the pivotal event being the isolation and purification of insulin in 1921. These hormones are integral to a sophisticated system that emerged early in evolution to regulate growth, development, and homeostasis. They serve as targeted signaling molecules that transfer specific information between cells and organs, ensuring coordinated and precise physiological responses. While experimental methods for identifying peptide hormones present challenges such as low abundance, stability issues, and complexity, computational methods offer promising alternatives. Advances in machine learning and bioinformatics have facilitated the prediction of peptide hormones, further enhancing their therapeutic potential. In this study, we explored three different computational frameworks for peptide hormone identification and determined that the meta-approach was the most suitable. Firstly, we evaluated the discriminative power of 26 feature descriptors using a series of baseline models and identified seven feature descriptors with high predictive potential. Through a systematic approach, we then selected the top 20 performing baseline models and integrated their predicted probabilities to train a meta-model, leveraging the strengths of multiple prediction strategies. Our final light gradient boosting-based meta-model, mHPpred, significantly outperformed the existing method, HOPPred, on both benchmarking and independent datasets. Notably, mHPpred also demonstrated superior performance compared to the hybrid and integrative framework approaches employed in this study. This superiority demonstrates the effectiveness of our multi-view feature learning strategy in capturing discriminative features and providing a more accurate prediction model for peptide hormones. mHPpred is publicly accessible at: https://balalab-skku.org/mHPpred

Ajou Open Repository

Integrative machine learning framework for the identification of cell-specific enhancers from the human genome

Author: Basith S
Wei L
Manavalan B
Lee G
Hasan MM
Publication venue
Publication date: 01/01/2021
Field of study

Enhancers are deoxyribonucleic acid (DNA) fragments which when bound by transcription factors enhance the transcription of related genes. Due to its sporadic distribution and similar fractions, identification of enhancers from the human genome seems a daunting task. Compared to the traditional experimental approaches, computational methods with easy-to-use platforms could be efficiently applied to annotate enhancers' functions and physiological roles. In this aspect, several bioinformatics tools have been developed to identify enhancers. Despite their spectacular performances, existing methods have certain drawbacks and limitations, including fixed length of sequences being utilized for model development and cell-specificity negligence. A novel predictor would be beneficial in the context of genome-wide enhancer prediction by addressing the above-mentioned issues. In this study, we constructed new datasets for eight different cell types. Utilizing these data, we proposed an integrative machine learning (ML)-based framework called Enhancer-IF for identifying cell-specific enhancers. Enhancer-IF comprehensively explores a wide range of heterogeneous features with five commonly used ML methods (random forest, extremely randomized tree, multilayer perceptron, support vector machine and extreme gradient boosting). Specifically, these five classifiers were trained with seven encodings and obtained 35 baseline models. The output of these baseline models was integrated and again inputted to five classifiers for the construction of five meta-models. Finally, the integration of five meta-models through ensemble learning improved the model robustness. Our proposed approach showed an excellent prediction performance compared to the baseline models on both training and independent datasets in different cell types, thus highlighting the superiority of our approach in the identification of the enhancers. We assume that Enhancer-IF will be a valuable tool for screening and identifying potential enhancers from the human DNA sequences

Ajou Open Repository