Computer Science & Engineering MS Thesis Defense by Erenay Dayanık

August 29, 2018

KOÇ UNIVERSITY

GRADUATE SCHOOL OF SCIENCES & ENGINEERING

COMPUTER SCIENCE & ENGINEERING

MS THESIS DEFENSE BY ERENAY DAYANIK

 

Title: Morphological Tagging and Lemmatization with Neural Components

 

Speaker: Erenay Dayanık

 

Time: June 4, 2018, 10.00

 

Place: Eng 208

Koç University

Rumeli Feneri Yolu

Sariyer, Istanbul

Thesis Committee Members:

Assoc. Prof. Dr. Deniz Yuret (Advisor, Koç University)

Assoc. Prof. Dr. Engin Erzin (Koç University)

Asst. Prof. Dr. Gülşen Cebiroğlu Eryiğit (Istanbul Technical University)

Abstract:

I describe and evaluate MorphNet, a language-independent, end-to-end model that is designed to combine morphological analysis and disambiguation. Traditionally, analysis of morphologically complex languages has been performed in two stages: (i) A morphological analyzer based on finite-state transducers produces all possible morphological analyses of a word, (ii) A statistical disambiguation model picks the correct analysis based on the context for each word. MorphNet uses a sequence-to-sequence recurrent neural network to combine analysis and disambiguation. The model consists of three LSTM encoders to create embeddings of various input features and a two layer LSTM decoder to predict the correct morphological analysis. When MorphNet is trained with text labeled with correct morphological analyses, the model is able to achieve state-of-the art or comparable results in twenty-six different languages.