Computer Sciences and Engineering MS Thesis Defense by Berkin Güler



KOÇ UNIVERSITY

GRADUATE SCHOOL OF SCIENCES & ENGINEERING

COMPUTER SCIENCES AND ENGINEERING

MS THESIS DEFENSE BY BERKİN GÜLER

 

Title: Load-Aware Compressed Incremental Checkpointing for Primary-Backup Replication

 

Speaker: Berkin Güler

 

Time: January 19, 2018, 10:00 AM

 

Place: ENG 208

Koç University

Rumeli Feneri Yolu

Sariyer, Istanbul

Thesis Committee Members:

Assoc. Prof. Öznur Özkasap (Advisor, Koc University)

Asst. Prof. Didem Unat (Koc University)

Asst. Prof. Ayşe Yılmazer (Istanbul Technical University)

Abstract:

Several distributed services ranging from key-value stores to cloud storages require fault-tolerance and reliability features. For enabling fast recovery and seamless transition, primary-backup replication protocols are widely used in different application settings including distributed databases, web services and the Internet of Things. In this thesis, we address the communication cost of the primary-backup replication protocol, and propose utilizing the checkpointing concept for improving the efficiency and performance of primary-backup replication. We then develop a software framework by extending the open-source RocksDB key-value store of Facebook on the PlanetLab overlay network using a geographically replicated system setup, and evaluate various checkpointing algorithms including non-periodic, periodic, incremental and compressed checkpointing. Experimental scenarios utilize the well-known benchmarking tool YCSB to generate realistic query workloads. Using various metrics of interest including blocking time, checkpointing time, checkpoint size, failover time, compression ratio and throughput, and testing with realistic workloads, our findings indicate that incremental checkpointing combined with a periodic usage performs the best by providing better system throughput and decrease in average blocking times in comparison to the traditional primary-backup replication and other checkpointing algorithms. Based on our findings, we propose load-aware compressed incremental checkpointing method (LACPB) for large scale primary-backup replication. Large-scale and comparative experimental results indicate that LACPB maintains drastically higher system throughput as well as reduced and stable client blocking times even in the dynamic workload scenarios.