Site Search
Computer Science


Dr. Ziad Kobti lecturingDr. Ziad Kobti
Dr. Ziad Kobti
Arunita Jaekel, Ph.D.Dr. Arunita Jaekel
Dr. Arunita Jaekel
Alioune Ngom, Ph.D.Dr. Alioune Ngom
Dr. Alioune Ngom
Christie Ezeife, Ph.D.Dr. Christie Ezeife
Dr. Christie Ezeife
Windsor WaterfrontWindsor Waterfront Park
Windsor Waterfront Park
Imran Ahmad, Ph.D.Dr. Imran Ahmad
Dr. Imran Ahmad
Lambton TowerLambton Tower
Lambton Tower
Jessica Chen, Ph.D.Dr. Jessica Chen
Dr. Jessica Chen
Robin Gras, Ph.D.Dr. Robin Gras
Dr. Robin Gras
Dr. Scott GoodwinDr. Scott Goodwin
Dr. Scott Goodwin
Dr. Luis RuedaDr. Luis Rueda
Dr. Luis Rueda
Dr. Robert KentDr. Robert Kent
Dr. Robert Kent
Xiaobu Yuan, Ph.D.Dr. Xiaobu Yuan
Dr. Xiaobu Yuan

Machine Learning approaches for cancer analysis

Add this event into your calendar using the iCAL format
  • Thu, 07/12/2018 - 1:00pm - 3:00pm

Machine learning approaches for cancer analysis

Doctoral Dissertation by

Abedalrhman Alkhateeb

Date: Thursday July 12th, 2018
Time: 10:00 am -12:00 pm
Location: 3105, Lambton Tower

Abstract: The transcriptome is the entire set of transcripts that are expressed from the genes of an organism under some conditions and at a particular time. The transcriptome can be sequenced into reads in order to find clues about genes and protein sequences, structures, and their functions. While in the past, transcriptome sequencing technology used to be costly and slow, more recently, next generation sequencing technology has emerged, decreasing the cost and increasing the speed of genome, transcriptome and exome sequencing. However, raw sequences come with artifacts, and hence preprocessing the reads is required for downstream analysis. In this dissertation, we have proven that preprocessing sequencing data is required for better performance throughout the genomics processing.

 In addition, we propose machine learning models that improve the pre-processing steps and identification of meaningful biomarkers in cancer. In the first contribution of this work, we present Zseq, a linear time method that identifies the most informative genomic sequences and reduces the number of biased sequences, sequence duplications, and ambiguous nucleotides. In the second contribution, we reveal that the combination of proper clustering, distance function and Index validation for clusters are suitable in identifying outlier transcripts, which have different trending than the majority of the transcripts, the trending of the transcript is the abundance throughout different stages of prostate cancer. In the modeling of cancer progression, the stages are represented as time points, and the increase in transcript abundance throughout those time points are cubic-spline interpolated.

Using time-series profile hierarchical clustering methods, we identified stage-specific mRNA species termed outlier transcripts that exhibit unique trending patterns as compared to most other transcripts during disease progression. This method is able to identify those outliers rather than finding patterns among the trending transcripts compared to the hierarchical clustering method based on Euclidean distance. A wet-lab experiment on a biomarker (gene CAM2G) confirmed the result of the computational model. Genes related to these outlier transcripts were found to be strongly associated with prostate cancer.

Breast cancer, on the other hand, is a widespread type of cancer in females and accounts for a lot of cancer cases and deaths in the world. Identifying the subtype of breast cancer plays a crucial role in selecting the best treatment. In the third contribution, we propose an optimized hierarchical classification model that is used to predict the breast cancer subtype. Suitable filter feature selection methods and new hybrid feature selection methods are utilized to find discriminative genes.

Studying breast cancer survivability among different patients who received various treatments may help understand the relationships between the survivability and treatment therapy based on gene expression. In the fourth contribution, we have built a classifier system that predicts whether a given breast cancer patient who underwent some form of treatment, which is either hormone therapy, radiotherapy, or surgery will survive beyond five years after the treatment therapy. We applied our tree-based method to a gene expression dataset that consists of 347 treated breast cancer patients and identified potential biomarker subsets with prediction accuracies ranging from 80.9% to 100%. Studying gene expression through various time intervals of breast cancer survival may provide insights into the recovery of the patients. Gene expression values are different in various stages of progression of the disease. Discovery of gene biomarkers can be a crucial step in predicting survivability and handling of breast cancer patients. In the fifth contribution, we propose a hierarchical clustering method to separate dissimilar groups of genes in time-series data as outliers. These isolated outliers, genes that trend differently from other genes, can serve as potential biomarkers of breast cancer survivability.

Thesis Committee:     
Internal Reader:            Dr. Alioune Ngom and Dr. Jianguo Lu       
External Reader:           Dr. Lisa Porter (Biological Sciences)
External Examiner:        Dr. Dongxiao Zhu (Wayne State)
Advisor:                       Dr. Luis Rueda
Chair:                          Dr. Mehdi S. Monfared (Math & Statistics)

See More: