Computer Science and Engineering MS Thesis Defense by Ali Tuğrul Balcı

August 29, 2018

KOÇ UNIVERSITY

GRADUATE SCHOOL OF SCIENCES & ENGINEERING

COMPUTER SCIENCE AND ENGINEERING

MS THESIS DEFENSE BY ALİ TUĞRUL BALCI

 

Title: Fast Screening Algorithms for Protein Interface Structural Alignments

 

Speaker: Ali Tuğrul Balcı

 

Time: July 30th, 2018, 1:00 PM

 

Place: ENG 208

Koç University

Rumeli Feneri Yolu

Sarıyer, Istanbul

 

Thesis Committee Members:

Prof. Attila Gürsoy (advisor, Koç University)

Prof. Özlem Keskin (co-advisor, Koç University)

Assoc. Prof. Arzucan Özgür (Boğaziçi University)

Prof. Alper Erdoğan (Koç University)

Assoc. Prof. Barış Akgün (Koç University)

 

Abstract:

Protein-protein interactions (PPIs) form the basis of many biological processes in living organisms. The significance of PPIs in mediating biological activity necessitates the identification of novel interactions. Template based structural alignment is one of the computational approaches to predict protein-protein interactions using known protein interfaces.  One challenge in template-based prediction is the computational cost due to the one-to-all comparison of the query protein against a database of all known interfaces. In this thesis, two different approaches have been developed a) QuickRet, a hashing based algorithm, b) and a deep learning based algorithm. QuickRet, a fast screening algorithm, ranks interfaces due to their structural similarity to a query protein.  It extracts features (angles and distances derived from four atoms) from structures of interfaces and compares them with the features extracted from the query protein. QuickRet is tested with the PIFACE database, a clustered protein-protein interface database, and predictions made by the template interface based PPI prediction algorithm, PRISM. The results indicate that QuickRet is successful in filtering structurally dissimilar interfaces for a given protein. With at least 80% match, 99% (320/43500 interface structures remained) of the database is eliminated and the average RMSD value of the remaining structures is 2.4 Å. With at least 90% match, 99.9% (50/43500 structures remained) of the database is eliminated and the average RMSD value drops to 2.28 Å. In addition, a deep learning based method which predicts,  for a given protein complex,  if the interface between the proteins of the complex is a true interface or not (based on known interfaces in Protein Data Bank). The model, a 3-dimensional convolutional model, analyzes the given structure and outputs the probability of the given structure being an interface. The accuracy of the model for several interface data sets, including PIFACE, PPI4DOCK, DOCKGROUND is approximately 80%. Both algorithms can be used to reduce the computational cost of template-based PPI predictions.