Computational Science and Engineering PhD Thesis Defense by Artür Manukyan



*******************************************************************

KOÇ UNIVERSITY

GRADUATE SCHOOL OF SCIENCES & ENGINEERING

COMPUTATIONAL SCIENCE AND ENGINEERING

PhD THESIS DEFENSE BY ARTÜR MANUKYAN

******************************************************************

 

Title: Statistical Learning with Proximity Catch Digraphs

 

Speaker: Artür Manukyan

 

Time: September 15, 2017, 11:00

 

Place: ENG 208

Koç University

Rumeli Feneri Yolu

Sariyer, Istanbul

Thesis Committee Members:

Assoc. Prof. Dr. Mine Çağlar (advisor)

Assoc. Prof. Dr. Selda Küçükçifçi

Prof. Dr. Atilla Gürsoy

Asst. Prof. Dr. Özgür Asar

Asst. Prof. Dr. Ümit Işlak

 

Abstract:

In the field of statistical learning, a significant portion of methods model data as graphs. Proximity graphs, in particular, offer solutions to many challenges in supervised and unsupervised statistical learning. Among these graphs, class cover catch digraphs (CCCDs) have been introduced first to investigate the class cover problem (CCP), and then employed in classfication and clustering. However, this family of digraphs can be improved further to construct better classifiers and clustering algorithms. The purpose of this thesis is to tackle popular problems in statistical learning like robustness, prototype selection and determining the number of clusters with proximity catch digraphs (PCD). PCDs are generalized versions of CCCDs and have been proven useful in spatial data analysis. We will investigate the performance of CCCDs and PCDs in both supervised and unsupervised statistical learning, and discuss how these digraph families address real life challenges. We show that CCCD classifiers perform relatively well when one class is more frequent than the others, an example

of the class imbalance problem. Later, by using barycentric coordinate system and by extending the Delaunay tessellations to partition Rd, we establish PCD based classifiers and clustering methods that are both robust to the class imbalance problem and have computationally tractable prototype sets, making them both appealing and fast. In addition, our clustering algorithms are parameter-free clustering adaptations of an unsupervised version of CCCDs, namely cluster catch digraphs (CCDs). We partition data sets by incorporating spatial data analysis tools based on Ripley’s K function, and we also define cluster ensembles based on PCDs for boosting the performance. Such methods are crucial for real life practices where domain knowledge is often infeasible.