Computer Science Seminar Series
March 20, 2018 @ 12:00 PM - 1:00 PM
Mohammad Salameh, postdoctoral researcher at Carnegie Mellon University in Qatar will be giving a talk on “Fine-Grained Arabic Dialect Identification” as part of the Computer Science Seminar Series.
Abstract:
Dialect identification (DID) is the task of automatically identifying the dialect of a particular segment of speech or text of any size. Previous work on the problem of Arabic Dialect Identification typically targeted coarse-grained five dialect classes plus Standard Arabic (6-way classification).
We present the first results on a fine-grained dialect classification task covering 25 specific cities from across the Arab World, in addition to Standard Arabic – a very challenging task.
We build several classification systems and explore a large space of features. Our results show that we can identify the exact city of a speaker at an accuracy of 67.9% on a blind test (a 9% error reduction over the state-of-the-art technique for Arabic dialect identification). We also report on additional insights from a data analysis of similarity and difference across Arabic dialects.
|
|
|