KOÇ UNIVERSITY
GRADUATE SCHOOL OF SCIENCES & ENGINEERING
COMPUTER SCIENCE & ENGINEERING
MS THESIS DEFENSE BY ERENAY DAYANIK
Title: Morphological Tagging and Lemmatization with Neural Components
Speaker: Erenay Dayanık
Time: June 4, 2018, 10.00
Place: Eng 208
Koç University
Rumeli Feneri Yolu
Sariyer, Istanbul
Thesis Committee Members:
Assoc. Prof. Dr. Deniz Yuret (Advisor, Koç University)
Assoc. Prof. Dr. Engin Erzin (Koç University)
Asst. Prof. Dr. Gülşen Cebiroğlu Eryiğit (Istanbul Technical University)
Abstract:
I describe and evaluate MorphNet, a language-independent, end-to-end model that is designed to combine morphological analysis and disambiguation. Traditionally, analysis of morphologically complex languages has been performed in two stages: (i) A morphological analyzer based on finite-state transducers produces all possible morphological analyses of a word, (ii) A statistical disambiguation model picks the correct analysis based on the context for each word. MorphNet uses a sequence-to-sequence recurrent neural network to combine analysis and disambiguation. The model consists of three LSTM encoders to create embeddings of various input features and a two layer LSTM decoder to predict the correct morphological analysis. When MorphNet is trained with text labeled with correct morphological analyses, the model is able to achieve state-of-the art or comparable results in twenty-six different languages.