Machine Learning for Prediction of Amino Acid Side Chain in Proteins

Mohammed Alamri, Tennessee State University


One of the challenges and a very significant part of a protein structure of a prediction in three-dimensional is a side chain prediction. This area of research has a large importance, due to its various applications in protein design. In past few years, a lot methodologies and techniques have been crafted for side chain prediction such as DLPacker, FASPR, SCWRL4 and OPUS-Rota4. However, current methods are not enough in speed and accuracy. In this research, we addressed the problem from a different perspective. We employed machine learning approaches to pack the side chain of protein molecules given only the backbone. We analyzed 32,000 protein molecules to extract important geometrical features that can distinguish between different orientations of side chain rotamers. We designed multiple machine learning models and compared the performance of these models against each other. Four machine learning models were built: Random Forest, XG-Boost, Decision Trees, Logistic Regression. Further, we implemented a stacking model that uses all previous machine learning models. The results of our experiment show that Random Forest and Stacking are the most effective models to overcome this problem, as they have the highest total average accuracy, 70.3 and 69.9%, respectively. Given an accuracy of the existing state-of-the-art approaches, we have got a new improved accuracy. For instance, FASPR, a highest state-of-the-art approach with accuracy, has reached 69.1%.

Subject Area

Computer science|Bioinformatics

Recommended Citation

Mohammed Alamri, "Machine Learning for Prediction of Amino Acid Side Chain in Proteins" (2023). ETD Collection for Tennessee State University. Paper AAI30693261.