صفحات جديدة باللغة العربية حصريًا قريبًا

يسرّنا الإعلان أننا نعكف حاليًا على إعداد صفحات جديدة مُصمّمة لجمهورنا الناطق باللغة العربية لتقديم تجربة استخدام متميزة ومحتوى مخصص وملائم أكثر لهم.

سنطلق هذه الصفحات المرتقبة قريبًا في الأشهر القليلة

Dedicated Arabic Pages Are Coming Soon

We're excited to announce that we are actively developing new, dedicated pages specifically designed for our Arabic-speaking users. These will offer tailored content and an enhanced experience.

Expected to launch in the next few months. Stay tuned!

OPTDIAC: An Optimal Diacritization Scheme for Arabic Orthographic Representation

Kemal Oflazer

CMU-Q Point of Contact

Different languages use different scripts for orthographic (spelling of sounds) representations in their writing system. Typical Arabic orthography is mostly consonantal and it is underspecified for short vowels and other phonemic markers, known as diacritics. The underspecification of the orthography creates significant readability issues with both learners (L2) and native speakers (L1) of Arabic. It also poses significant challenges for automatic processing tools (NLP) in handling the Arabic language since it renders the text extremely ambiguous. Recently, researchers have devised automatic diacritization tools for Arabic text rendering it fully diacritized. However full diacritization has been shown to degrade performance in NLP tools. Notably, psycholinguists and educators also noted that full diacritization delays readability even among advanced skilled Arabic speakers. We hypothesize there is an intermediate level of diacritization that is beneficial to both NLP and human readability. In this proposal, we explore the space of principled partial diacritizations in the context of both NLP and human readability aiming to discover the optimal diacritization level. We will investigate the problem using advanced machine learning techniques for NLP applications. Simultanelously, we will collaborate with colleagues in education and language learning to measure the impact of our devised diacritization schemes on both L1 and L2 language learners of Arabic.

Project

NPRP 6 - 1020 - 1 - 199

Year

2014

Status

Closed

Team
image

Mona Diab

George Washington University
image

Kemal Oflazer

Carnegie Mellon University - Qatar
image

Houda Bouamor

Carnegie Mellon University - Qatar
image

Zeinab Ibrahim

Carnegie Mellon University - Qatar