Site Search
Computer Science


Imran Ahmad, Ph.D.Dr. Imran Ahmad
Dr. Imran Ahmad
Christie Ezeife, Ph.D.Dr. Christie Ezeife
Dr. Christie Ezeife
Dr. Robert KentDr. Robert Kent
Dr. Robert Kent
Dr. Scott GoodwinDr. Scott Goodwin
Dr. Scott Goodwin
Robin Gras, Ph.D.Dr. Robin Gras
Dr. Robin Gras
Lambton TowerLambton Tower
Lambton Tower
Alioune Ngom, Ph.D.Dr. Alioune Ngom
Dr. Alioune Ngom
Dr. Ziad Kobti lecturingDr. Ziad Kobti
Dr. Ziad Kobti
Jessica Chen, Ph.D.Dr. Jessica Chen
Dr. Jessica Chen
Xiaobu Yuan, Ph.D.Dr. Xiaobu Yuan
Dr. Xiaobu Yuan
Dr. Luis RuedaDr. Luis Rueda
Dr. Luis Rueda
Arunita Jaekel, Ph.D.Dr. Arunita Jaekel
Dr. Arunita Jaekel
Windsor WaterfrontWindsor Waterfront Park
Windsor Waterfront Park

Low Diesnsional Clustering to Evaluate Pairwise Protein Structure Alignment Algorithms

Add this event into your calendar using the iCAL format
  • Tue, 08/15/2017 - 11:00am - 1:00pm

Low Dimensional Clustering to Evaluate Pairwise Protein Structure Alignment Algorithms

MSc Thesis Proposal by:

Shalini Bhattacharjee

Date:  Tuesday, August 15th, 2017
Time:  11: 00 am – 1:00 pm
Location: 3105, Lambton Tower

Abstract:The alignment of two protein structures is a fundamental problem in structural bioinformatics. Their structural similarity carries with it the connotation of similar functional behavior that could be exploited in various applications. A plethora of algorithms, including one by us, is a testament to the importance of the problem. In this proposal, we propose a novel approach to measure the effectiveness of a sample of three such algorithms, DALI, TM-align and EDAlign-SSE, for detecting structural similarities among proteins. The underlying premise is that structural proximity should translate into spatial proximity. To verify this, we will be carrying out extensive experiments with different datasets, each consisting of proteins from two to six different families. To measure the extent of structural similarity of two proteins, the root mean square deviation (RMSD) will be used.  For each protein dataset, we need to compute a distance matrix, where each distance is the RMSD distance of a pair of protein structures. For each distance matrix obtained, we will use Principal Component Analysis (PCA) to obtain an embedding of a set of points (each representing a protein) that realize these distances in a two-dimensional space. To compare the clustering of the families, we will be using the K-Means clustering algorithm to cluster the points, sans family labels. We will finally discuss the correlation of structural proximity to spatial proximity.

In further addition to our work, we will be proposing a Multiple Structural Alignment method. MSA is a fundamental tool for correlating the structural similarity of proteins with their functional similarity. Similar to an heuristic algorithm for multiple sequence alignment, we have used the Progressive Multiple Alignment approach, in our algorithm. We will built a guide tree representing the similarity between sequences, this tree will help us through the alignment process. We will build an alignment for each internal node of the tree, where the alignment at any internal node will have all the sequences previously aligned. We will use the root mean square deviation (RMSD) as a measure of alignment quality, and report this measure for a large and varied number of alignments. We will be comparing the execution times of our algorithm with the well-known algorithm MUSTANG for all the tested alignments.

Thesis Committee:
Internal Reader: Dr.  Dan Wu
External Reader: Dr. Myron Hlynka
Advisor: Dr. Asish Mukhopadhyay
Co-Advisor: Dr. Yash P Aneja

See More: