Table of Contents
3. Final Forms
In this research paper, a new Latin and Arabic Script based orthography is postulated for Tunisian, Algerian and Moroccan Arabic. The Latin Script is a mixtureof Deutsche Morgenländische Gesellschaft Umschrift andBuckwaltertransliteration. The Arabic Script is simplified with reference to Al-Toma guidelines. The obtained new Latin and Arabic orthographies had been proved to be absolutely interconvertible as a Latin to Arabic script converter had been successfully created for them.
When creating Wikipedia and Wiktionary in Tunisian, Algerian and Moroccan Arabic respectively since 2011, 2008 and 2008, it was seen that over 60% of young users use Latin Script for Tunisian, Algerian and Moroccan Arabic because it is the most supported layout by Computers and because users from Maghrebi Diaspora do not have sufficient proficiency of the Arabic Script.[1-8]Moreover, Tunisian, Algerian and Moroccan Arabic contains many borrowed phonemes that are used in Loanwords and these phonemes and mainly vowel ones are not supported in Arabic Script. [9,10,11,12,13] The use of additional diacritics for this is possible. However, this can cause confusion for users.
The only solution was the creation of a Script Converter that is intuitive and that could be used to convert Latin Script in Arabic Script. The user has to edit in Latin Script and the written words can be converted to Arabic Script by the website. This is technically possible thanks to the development of MediaWiki Language Converter by ZhengZhu Feng.However, when examining the Latin Script for Tunisian, Algerian and Moroccan Arabic, it was clearly seen that it is based on a phonetic transcription of words and without any consideration of the morphology of Arabic.[10,13]
This gave an inconvertible Latin Script orthography for Tunisian Arabic because of the phenomenon of pronunciation simplification:
- If they are in the end of a word, [i:] and [ɪ] are pronounced as [ɪ], [u:] and [u] are pronounced as [u], and [a:], [ɛː], [a] and [æ] are pronounced as [æ]. This is what explains the lack of accuracy of the grammar specification of Tunisian. For example, none of the works had made an interest to explain why the present of /mʃæ/ is /yimʃi/ and the present of /bdæ/ is /yibdæ/...
- Elision: If a word finishes with a vowel and the next word begins with a short vowel, this short vowel and the space between the two words are not pronounced. The lack of consideration of this simplification made of some rules of Tunisian difficult like for the situation of the determinant "ĭl-" meaning The.[9,10,11,12]
- Epenthesis: If a word begins with two successive consonants, an [ɪ] is pronounced in its beginning.[9,10,11,12]
- There are even several simplifications that exist in some varieties of the dialect. For example, Short vowels are pronounced as schwa in Northwestern and Southwestern Tunisian dialects as they are the varieties of Tunisian Arabic pronounced using Algerian Phonology. Another interesting example is the simplification of /θ/ as /t/ in the Sahil Dialect when it comes in the beginning of a word. These simplifications should not be considered when transliterating Tunisian. Tunisian should be transliterated in a way that let it easily read using Tunisian, Algerian and Libyan Phonology.[9,10,11,12]
Algerian and Moroccan Arabic also involve this phenomenon that also includes other facts like Germination: Short vowels in Moroccan and Algerian Arabic in the beginning or in the middle of a word are pronounced as a schwa.[16-22]
Furthermore, the situation of the Latin Script is not very excellent as there are many systems used for Maghrebi dialects. These Latin Script Writing systems are classified into two categories:[1-8,23-32]
- Extended Latin Alphabet Systems: These systems are based on attributing non-Latin phonemes to additional Latin Letters.[6,23,24]The most common system is Deutsche MorgenländischeGesellschaftUmschrift.[23,24] However, several other systems were created in order to reduce the number of additional Latin Letters. [25,26,27] They used digraphs for non-Latin phonemes.[25-28] But, these systems were not successful because the use of digraphs can cause confusion for readers and because the ratio of Latin letters per Arabic Letters was characteristically very high letting reading one of them considerably difficult. [29-30]
- Alphanumeric System: Commonly known as Arabizi, it is widely used in Social Networks. It uses numerals for non-Latin phonemes: 2 in order to transcribe a glottal stop, 3 to transcribe /ʕ/, 5 to transcribe /χ/, 6 to transcribe /tˤ/, 7 to transcribe /ħ/, 8 to transcribe /θ/ and 9 to transcribe /q/.[1-8,23,31-33]Although this method is common, it is not practical for writing texts in Maghrebi Arabic because the numerals can be letters and numbers at the same time and this fact is linguistically not accurate.[1-8,23,31-32]
What worsened more the situation of Latin Script is that all these methods lacked of standardization and do not involve regulations and this led it to be used only in informal purposes and not for important ones like literacy.[1-8,23,31-34]
The solution was the creation of acommonly used orthography that is basedon the transliteration of words as written in Arabic Script. The idea was created by Mr. Mohamed Maamouri in 2004 and was based on the work of Timothy Buckwalter for Standard Arabic and has been later developed by Mr. Nizar Habash since 2012. [34,35]Although the method is efficient, it included many graphs. For example, there are four letters for the glottal stop… It attributes a Latin graph for any Arabic graph without any consideration of Arabic Morphology…[34,35,36]Furthermore, this method differentiates between the phonemes obtained respectively of uppercase and lowercase letters.That is why this method is also deficient. [34,35,36]
That is why we had been obliged to review the Deutsche MorgenländischeGesellschaftUmschrift, the most used transcription method for Maghrebi Arabic by linguists, by considering the principles of Buckwalter Transliteration.
There are two theories about letters in Maghrebi Arabic and mainly in Tunisian Arabic.[37,38]
- Talmoudi Theory (Vowel Harmony): [ɑː] and [ɛː] are allophones and [ɑ] and [æ] are allophones. [ɑ] and [ɑ:] are existing in a pharyngeal or emphatic environment and [æ] and [ɛː] are existing in other situations.
- Chekili Theory (Consonant Harmony): [ɑː] and [ɛː] are distinct phonemes and [ɑ] and [æ] are distinct phonemes. There is a group of emphatic and pharyngeal letters that can be only next to [ɑ] and [ɑ:] and a group of letters that can be next to [æ] and [ɛː]. When a letter from the second group excepting s and t is next to [ɑ] and [ɑ:], the sound of that letter is substituted by the one of its emphatic minimal pair. For example, [b] becomes [bˤ] when it is next to [ɑ] or [ɑ:],[l] becomes [lˤ] when it is next to [ɑ] or [ɑ:]. 
The use of the pronunciation of a as [ɑ] in a non pharyngealand non emphaticenvironment occurs only in loans from foreign languages and mainly the European ones.[37,38] This use of a as [ɑ] is relatively rare and that is why it is not a fact of Maghrebi Arabic phonology. [37,38]
Although the two methods are scientifically justified, it is more likely that TalmoudiTheory is the accurate one because:
- If [ɛː] and [ɑː] are not allophones of the same letter, suffixes and patterns containing [ɛː] and [ɑː] would have different functions.[37,38] However, [ɛːt] and [ɑːt] for example are both plural suffixes and even CɛːCiC and CɑːCiCpatterns are both used for the active participles.[37,38] What differs between [ɑ:] and [ɛ:] is that [ɑ:] is the vowel used in a pharyngeal environment and [ɛ:] is the one used in the other situations. [37,38]
- [i], [i:], [u] and [u:] are respectively pronounced as [e], [e:], [o] and [o:] in an emphatic or pharyngeal environment in Western Tunisian Arabic dialects, in Algerian Arabic and in Moroccan Arabic.[16,18]So, this phenomenon is not restricted to the short a and long a.
That is why it is adopted in this Transcription Method.
As minor characteristics are not considered in a common writing system and as only general facts without considerable exceptions are adopted and constitute the Received Pronunciation of the Arabic dialect,[29,33] the use of a as [ɑ] in a non pharyngeal or non emphatic environment is not supported by this writing system.[37,38]
In the Basic DMG Transliteration, c and e were not used. So, we attributed them for two consonant phonemes that were using Additional Latin Letters.[10,36] Dhah and Dhad are two different letters that are corresponding to the same consonant phonemes and by that, they are transcribed in DMG using the same Letter.[10,36]So, we will attribute two Latin Letters for Dhah and Dhadlike in Buckwalter Transliteration as they are differently transcribed in Arabic Script using two Latin Letters and in order to let etymological studies about Tunisian Arabic easier[34,35] although Dhah is dropped in informal practice and is only used in linguistic studies and Tunisian People don’t tend to differentiate between them and is transcribing both of them as Dhad.
TunisianArabic consonant phonemes[9-14]
illustration not visible in this excerpt
* Used only in Linguistic studies.
In Moroccan and Algerian Arabic, the same phonemes exist excepting /θ/ that is considered as /t/ in Moroccan Arabic and /ð/ that is considered as /d/ in Algerian and Moroccan Arabic.[16-23,39]
Algerian and Moroccan Arabic consonant phonemes[16-23,39]
illustration not visible in this excerpt
* Used only for linguistic purposes
** Used only in Algerian Arabic
*** Used only in Linguistic Studies about Moroccan Arabic
**** Also Pronounced as /t͡s/
The choice of putting a diacritic for short vowels and not for long vowels was not casual.[9-14,16-21,26-28,39-42] In fact, texts in Maghrebi Arabic dialects contain more long vowels than short vowels. This is mainly explained by the tendency of dropping the short vowels from the most of the Formal Arabic words.[9-14,16-21,26-28,39-42]
Maghrebi Arabic involves some foreign phonemes which are used mainly in loanwords which are written between parentheses in the table about the vowel phonemes. [9-14,16-21]Some of them can be dropped in informal practice like Nasals thatare mainly substituted by an n or m by people.[10,27]
Maghrebi Arabic vowel phonemes[9-14]
illustration not visible in this excerpt
* Pronounced mostly as a schwa when in the beginning or in the middle of a word in Algerian and Moroccan Arabic. In informal daily use, they can be substituted by people to ĭ when they are in the beginning or in the middle of a given word.
3. Final Forms
Final Forms are also added like in Buckwalter Transliteration:äfor ĂlĭfMăqṣură.[34,35]
illustration not visible in this excerpt
As Maghrebi Arabic is not standardized, the conventional method used for writing it is the one described by Nizar Habash et al. This method is mainly based on the orthography of Standard Arabic.[43-45] Although this method succeeded in transcribing Tunisian, Algerian and Moroccan Arabic,[46-48] we will adjust it so that it becomes more efficient, easier to read and write and by that helpful for literacy purposes:
In Buckwalter transliteration, h or ħ is added to feminine nouns finishing with a short a. This is useless because there are limited words that are finishing with a short vowel and that are not finished in Arabic Script with «». They can be dropped.In the new transcription, if a word finishes with a short a, «» is automatically added to it.[10,13,34,35] When the feminine nouns finishing with a short a is used as the first noun of a compound noun, [t] is pronounced after it but never transcribed and the noun keeps its orthography.[13,34,35,45]
The conversion of the glottal Stop to Arabic is very difficult because the choice of the graph depends of what is the letter before it or what is the letter after it. That is why we agree on the idea of Al-Toma about the need of a reform of Arabic Script.The transcription of the Glottal Stop became fully automatic. It is written as a ' like in DMG Transcription. [10,13] When it is in the end of the word and preceded by a long vowel, it is written as. In all other situations, it is written as.We have chosen not to reduce the transliteration of the glottal stop in Arabic Script into as some of the reformers of Arabic orthography had said so that the intelligibility of reading the books written in Tunisian Arabic before the establishment of this method for foreign learners would be evident.
The determinant "ĭl-" that means «The» is always written as "ĭl" +hyphen+defined noun so that it can be differentiated from the "ĭl" with which begin several indefinite nouns beginning with and so that a hămză would not be added on the Alif of "ĭl-".However, when it is written as ĭ+Sun Consonant+-, it is converted to Arabic Script like "ĭl-" in order to let the transcription method more flexible for users.[10,13]
When the noun begins with a vowel, it is converted in Arabic automatically as an Alif without having to add anything even if it is preceded by the "ĭl-".(Appendix A)
To indicate a stressed consonant, it is done in Arabic Script by adding a Shaddah after the consonant. However, in Latin Script, it is done by doubling the consonant as it is done in DMG Method.[10,13]
Influenced by the ideas of Al-Toma, all prepositions became separate from nouns For example, b- ĭl-sif. This is done for four reasons:
This structure was used in the works of Taoufik Ben Brik and Ali Douagi about Tunisian Arabic.
This form ameliorates the quality of the tokenization and understanding of Tunisian.[51,52]
When ĭl- is not preceded by a space, it is not detected by the script converter.(Appendix A)
This helped the differentiation between some prepositions and the first syllable of some indefinite nouns. For example, b- nĭyyă and bnăyyă.[51,52]
We can also benefit from the use of ŭw and ĭy in Buckwalter transliteration in our method.[34,35]
[u:] is transliterated as u when it is totally dropped and ŭw when it is not totally dropped when it changes of gender or number (Optional).
[i:] is transliterated as i when it is totally dropped and ĭy when it is not totally dropped when it changes of gender or number (Optional).
For example: /t u: ns i: / (sing.) à /t w a:nsa/ (plur.)
So, it is written as t ŭw ns i
/l i: l/ (sing.) à /l y a:li:/ (plur.)
So, it is written as l ĭy l
/bɑ:h i: / (sing.) à /bɑ:h y i:n/ (plur.)
So, it is written as bah ĭy
/h u: wɑ/ (masc.) à /hi:jɑ/ (femn.)
So, it is written as h u wă
/χ u: ð / (Imperative) à /χði:t/ (Past)
So, it is written as x u đ
/jq u: m/ (Present) à /qɑ:m/ (Past)
So, it is written as yq u m
Similarly, we can differentiate the suffix /u:/ used for conjugating verbs in plural in imperfective tense from the suffix /u:/ used as a direct object pronoun.[34,35]
/kælm u: / (Used after a singular noun) àkălmw (w as a direct object pronoun)
/kælm u: / (Used after a plural noun) àkălmu (u as a plural verb suffix)
As the work is built upon COTA convention, writing the direct object pronoun as a ŭ when written in the end of the word is also accepted.In this particular situation, it would be transcribed in the Arabic Script as.[10,13]
As for the punctuation used in this method, It would be the same as the one for French for Latin Script and the same as the one for Arabic for Arabic Script.[10,13,23-28,43-48]
The method has been very efficient as it is the first morpho-phonologic method of transcription of Maghrebi and as it served in creating a 16.5 KB perfect script converter for Maghrebi as shown in Appendix A and a full table of correspondence between Arabic Script and Latin Script based on the work of Anjela Al-Raiesas shown in Appendix B.
Before, Script Converter for Standard Arabic and Arabic Dialects are based on a very extended database of Arabic Script Substrings and their Latin Script Corresponding Substrings.[54,55]
We thank Mr. Mohamed Maamouri, Mr. Nizar Habash, Ms. Karen McNeil, Mr. LameenSouag, Ms. Ines Dallaji and Mr. Chuck Fennig for their useful comments and important reviews. I also thank FaouezBoujelben and OussamaEnnaifar among others for testing the method and several users of Wikipedia like E3, Largoplazo, Mjbmr and Koavf and the language committee of Wikimedia Foundation for their helpful discussions.
1. Younes, J., &Souissi, E. (2014). A quantitative view of Tunisian dialect electronic writing. 5th International Conference on Arabic Language Processing, CITALA 2014
2. Masmoudi, A., Habash, N., Ellouze, M., Estève, Y., &Belguith, L. H. (2015). Arabic Transliteration of Romanized Tunisian Dialect Text: A Preliminary Investigation. In Computational Linguistics and Intelligent Text Processing (pp. 608-619). Springer International Publishing.
3. Al-Badrashiny, M., Eskander, R., Habash, N., &Rambow, O. (2014, June). Automatic transliteration of romanized dialectal arabic. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning (pp. 30-38).
4. Bies, A., Song, Z., Maamouri, M., Grimes, S., Lee, H., Wright, J., ...&Rambow, O. (2014). Transliteration of Arabizi into Arabic Orthography: Developing a Parallel Annotated Arabizi-Arabic Script SMS/Chat Corpus. ANLP 2014, 93.
5. Abu Elhija, D. A. (2014). A new writing system? Developing orthographies for writing Arabic dialects in electronic media. Writing Systems Research, 6(2), 190-214.
6. Euler, K. (2013). The face of Al-Maghreb: how Moroccans are using social networking (Doctoral dissertation, University of Pittsburgh).
7. Shoufan, A., & Al-Ameri, S. (2015, July). Natural Language Processing for Dialectical Arabic: A Survey. In ANLP Workshop 2015 (p. 36).
8. Elfardy, H., Al-Badrashiny, M., &Diab, M. (2014). AIDA: Identifying code switching in informal Arabic text. EMNLP 2014, 94.
9. Jabeur, M. (1987). A sociolinguistic study in Rades, Tunisia. Unpublished PhD dissertation. Reading: University of Reading.
10. Singer, H. R. (1994). Ein arabischer Text aus dem alten Tunis. Semitische Studien unter besonderer Berücksichtigung der Südsemitistik, 275–284.
11. Maamouri, M. (1967). The Phonology of Tunisian Arabic. Ithaca: Cornell University.
12. Gibson, M. (2009). Tunis Arabic. Encyclopedia of Arabic Language and Linguistics, 4, 563–71.
13. Ben Abdelkader, R., &Naouar, A. (1979). Peace Corps/Tunisia Course in Tunisian Arabic.
14. Zribi, I., Graja, M., Khmekhem, M. E., Jaoua, M., &Belguith, L. H. (2013). Orthographic transcription for spoken tunisianarabic. In Computational Linguistics and Intelligent Text Processing (pp. 153–163). Springer Berlin Heidelberg.
15. Wikimedia Foundation (2010). Language Converter, https://doc.wikimedia.org/mediawiki-core/master/php/classLanguageConverter.html#details
16. Heath, J. (1997). Moroccan Arabic phonology. Phonologies of Asia and Africa (including the Caucasus), 1, 205-217.
17. Moussaoui, L. (1990). Quelques remarques phonologiques à propos d’un parler arabe algérien. La Linguistique au Maghreb: Maghreb linguistics, 145.
18. Heath, J. (1987). Ablaut and Ambiguity: Phonology of a Morcoccan Arabic Dialect. SUNY Press.
19. Amakhmakh, N. (1997). Non-linear phonology of a Moroccan Arabic dialect. University of Wisconsin--Madison.
20. Lowenstamm, J. (2011). The phonological pattern of phi-features in the perfective paradigm of Moroccan Arabic. Brill's Journal ofAfroasiaticLanguagesandLinguistics, 3(1), 140-201.
21. Bouhadiba, F. A. N. (1988). Aspects of Algerian Arabic verb phonology and morphology (Doctoral dissertation, University of Reading).
22. Benyoucef, R., &Mahadin, R. (2013). Phonological processes in Algerian Arabic as spoken in Mostaganem: An Optimalty perspective. Research on HumanitiesandSocialSciences, 3(14), 85-100.
23. Daouda, T., &Regragui, N. (2012). QyasKtbDarija : projet pour un double standard phonologique pour l'écriture de l'arabe marocain ou darija. KtbDarija.com, September 2012.
24. Marçais, W. (1908). Le dialecte arabe des Ulad Brahim de Saîda. Paris: BNF, pp. 101–102
25. Jourdan, J. (1913). Cours normal et pratique d'arabe vulgaire. Vocabulaire, historiettes, proverbes, chants. Dialecte tunisien. Mme. veuve L. Namura.
26. Inglefield, P. L. (1970). Tunisian Arabic Basic Course. Volumes 1 and 2.
27. Ben Abdelkader, R. (1977). Peace Corps English-Tunisian Arabic Dictionary.
28. Ben Abdelkader, R., &Naouar, A. (1979). Peace Corps/Tunisia Course in Tunisian Arabic.
29. UNESCO Organization. (1978, June). Memorandum on the Transcription and Harmonization of African Languages. The 1978 UNESCO meeting on the transcription and harmonization of African Languages: http://unesdoc.unesco.org/images/0003/000334/033415EB.pdf
30. Habash, N., Soudi, A., &Buckwalter, T. (2007). On arabic transliteration. InArabic computational morphology (pp. 15-22). Springer Netherlands.
31. Volk, L. (Ed.). (2015). The Middle East in the World: An Introduction. Routledge.
32. Darwish, K. (2014). Arabizi Detection and Conversion to Arabic. ANLP 2014, 217.
33. Karan, E. (2006). Writing system development and reform: A process (Doctoral dissertation, University of North Dakota).
34. Maamouri, M., Graff, D., Jin, H., Cieri, C., &Buckwalter, T. (2004). Dialectal Arabic Orthography-based Transcription. In EARS RT-04 Workshop.
35. Habash, N., Diab, M. T., &Rambow, O. (2012). Conventional Orthography for Dialectal Arabic. In LREC (pp. 711‐718).
36. Lawson, D. R. (2010). An assessment of Arabic transliteration systems. Technical Services Quarterly, 27 (2), 164-177.
37. Talmoudi, F. (1979). The Arabic dialect of Susa (Tunisia). OrientaliaGothoburgensia.
38. Chekili, F. (1982). The morphology of the Arabic dialect of Tunis. London: University of London.
39. Steriade, D. (1997). Phonetics in phonology: the case of laryngeal neutralization.
40. Lowenstamm, J. (1991). Vocalic length and centralization in two branches of Semitic (Ethiopic and Arabic). Semitic Studies in Konor of WollLeslau on the Occasion of his, 949-965.
41. Kiparsky, P. (2003). Syllables and moras in Arabic. The syllable in optimality theory, 147-182.
42. Ferrando, I. (1998). On someparallelsbetweenAndalusi and MaghrebiArabic.Peuplement et arabisation au Maghreb occidental: dialectologie et histoire. Madrid-Zaragoza: Casa de Velazquez-Universidad de Zaragoza, 59-74
43. Maamouri, M., Bies, A., Buckwalter, T., Diab, M., Habash, N., Rambow, O., &Tabessi, D. (2006). Developing and using a pilot dialectal Arabic treebank. InProceedings of the Fifth International Conference on Language Resources and Evaluation, LREC’06.
44. Diab, M., Habash, N., Rambow, O., Altantawy, M., &Benajiba, Y. (2010, May). COLABA: Arabic dialect annotation and processing. In LREC Workshop on Semitic Language Processing (pp. 66-74).
45. Habash, N., Diab, M. T., &Rambow, O. (2012). Conventional Orthography for Dialectal Arabic. In LREC (pp. 711-718).
46. Zribi, I., Boujelbane, R., Masmoudi, A., Ellouze, M., Belguith, L., &Habash, N. (2014). A Conventional Orthography for Tunisian Arabic. In Proceedings of the Language Resources and Evaluation Conference (LREC), Reykjavik, Iceland.
47. Tachicart, R., &Bouzoubaa, K. (2014, May). A hybrid approach to translate Moroccan Arabic dialect. In Intelligent Systems: Theories and Applications (SITA-14), 2014 9th International Conference on (pp. 1-5). IEEE.
48. Saadane, H., &Habash, N. (2015, July). A Conventional Orthography for Algerian Arabic. In ANLP Workshop 2015 (p. 69).
49. Al-Toma, S. J. (1961). The Arabic writing system and proposals for its reform. The Middle East Journal , 403-415.
50. Ben Brik, T. (2014). Kawazaki, Tunis: ed. Sud Editions
51. Attia, M. A. (2007, June). Arabic tokenization system. In Proceedings of the 2007 workshop on computational approaches to semitic languages: Common issues and resources (pp. 65-72). Association for Computational Linguistics.
52. Habash, N., &Rambow, O. (2005, June). Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (pp. 573-580). Association for ComputationalLinguistics.
53. Al-Raies, A. (2011). Teaching spoken Arabic A diacritized Arabic alphabet in informal writing.Langues etLittératures du mondearabe, 1 (9), 6-24.
54. Sherif, T., &Kondrak, G. (2007, June). Substring-based transliteration. In Annual Meeting - Association for Computational Linguistics (Vol. 45, No. 1, p. 944).
55. Al-Onaizan, Y., & Knight, K. (2002, July). Machine transliteration of names in Arabic text. In Proceedings of the ACL-02 workshop on Computational approaches to semitic languages (pp. 1-13). Association for Computational Linguistics.
illustration not visible in this excerpt
Table B: Table of Correspondence between the Arabic and Latin Scripts for Maghrebi Arabic Dialects
illustration not visible in this excerpt