Andrew Kitchen, Christopher Ehret, Shiferaw Assefa and Connie J. Mulligan

26 September 2021

Abstract

The evolution of languages provides a unique opportunity to study human population history. The origin of Semitic and the nature of dispersals by Semitic-speaking populations are of great importance to our understanding of the ancient history of the Middle East and Horn of Africa. Semitic populations are associated with the oldest written languages and urban civilizations in the region, which gave rise to some of the world’s first major religious and literary traditions. In this study, we employ Bayesian computational phylogenetic techniques recently developed in evolutionary biology to analyse Semitic lexical data by modelling language evolution and explicitly testing alternative hypotheses of Semitic history. We implement a relaxed linguistic clock to date language divergences and use epigraphic evidence for the sampling dates of extinct Semitic languages to calibrate the rate of language evolution. Our statistical tests of alternative Semitic histories support an initial divergence of Akkadian from ancestral Semitic over competing hypotheses (e.g. an African origin of Semitic). We estimate an Early Bronze Age origin for Semitic approximately 5750 years ago in the Levant, and further propose that contemporary Ethiosemitic languages of Africa reflect a single introduction of early Ethiosemitic from southern Arabia approximately 2800 years ago.Keywords: Semitic, language evolution, Middle East, Horn of Africa, Bayesian phylogenetics, population history

Introduction

Semitic languages comprise one of the most studied language families in the world. Semitic is of particular interest due to its association with the earliest civilizations in Mesopotamia (Lloyd 1984), the Levant (Rendsburg 2003) and the Horn of Africa (Connah 2001), which gave rise to several of the world’s first major religious traditions (Judaism, Christianity and Islam) and literary works (e.g. the Akkadian poem The epic of Gilgamesh). The importance of Semitic dates back at least 4350 years before present (YBP) to ancient Sumer in Mesopotamia, where the Akkadian language replaced Sumerian (Buccellati 1997). From this time forward, archaeological evidence for Semitic among the Hebrews and Phoenicians in the Levant (Diakonoff 1998Rendsburg 2003) and the Aksumites in the Horn of Africa (Connah 2001) suggests that Semitic-speaking populations and their languages underwent a complex history of geographical expansion, migration and diffusion tied to the emergence of the earliest urban civilizations in these regions (Lloyd 1984Connah 2001Richard 2003bNardo 2007). Uncertainties about key details of this history persist despite extensive archaeological, genetic and linguistic studies of Semitic populations. A more comprehensive understanding of the precise origin and relationship of Semitic populations to each other is necessary to fully appreciate their complex history.

Although multiple genetic studies of extant Semitic-speaking populations have been conducted (Nebel et al. 2002Capelli et al. 2006), much is still unknown about the genealogical relationships of these populations. Most previous genetic studies focus on time frames that are either too recent (the origin of Jewish communities in the Middle East and Africa; Hammer et al. 2000Nebel et al. 2001Rosenberg et al. 2001) or too ancient (the out-of-Africa migration of modern humans; Passarino et al. 1998Quintana-Murci et al. 1999) to provide insight about the origin and dispersal of Semitic languages and Semitic-speaking populations.

Previous historical linguistic studies of Semitic languages have used the comparative method to infer the genealogical relationships of Semitic (for review, see Faber 1997). The comparative method is a technique that uses the pattern of shared, derived changes in language (vocabulary, syntax or grammar), termed innovations, to assess the relative relatedness of languages, although this method cannot date the divergences between languages (Campbell 2000). Cognates, which are words that generally share a common form and meaning through descent from a common ancestor (e.g. the English word ‘night’ is a cognate with the German word ‘Nacht’), serve as the data used most often in comparative analyses.

The field of Semitic linguistics has generally coalesced around a model that places the ancient Mesopotamian language Akkadian as the most basal lineage of Semitic (Hetzron 1976Faber 1997). This standard model divides Semitic into East Semitic, composed of the extinct Akkadian and Eblaite languages, and West Semitic, consisting of all remaining Semitic languages that are distributed from the Levant to the Horn of Africa. West Semitic is in turn divided into South (consisting of Ethiosemitic, Epigraphic South Arabian and Modern South Arabian (MSA)) and Central linguistic groups, but the genealogical relationships of the languages within these two groups are poorly defined (Huehnergard 19901992Rodgers 1992Faber 1997). Additionally, no consensus exists for placing Arabic in either the Central or South Semitic group (Hetzron 1976Blau 1978Diem 1980; Huehnergard 19901992Faber 1997), which makes Arabic’s genealogical location simultaneously uncertain and interesting, as Central and South Semitic are geographically and genealogically distinct entities.

Dating language divergences has been controversial, especially when linguistic clocks are involved (for discussion, see Renfrew et al. 2000). The existence of a linguistic clock is controversial as it assumes that languages evolve at a fixed rate (Ehret 2000), whereas there is evidence for variation in rates of change between words and languages and no reason why languages should evolve at fixed rates (Blust 2000). However, recent studies have shown that much variation in the rates of linguistic change may follow generalized rules that apply across language families (Pagel et al. 2007Atkinson et al. 2008). This suggests that variation in the rates of change between words and languages can be modelled by applying techniques used in evolutionary biology (e.g. probabilistic modelling of relative rates of word change with relaxed clock or covarion models of language evolution). Computational phylogenetic methods such as these are consistent with the philosophical underpinnings of the linguistic comparative method (i.e. inferring relationships by the comparison of similar features between languages) and provide an objective statistical framework to accurately estimate language divergences. Furthermore, Bayesian phylogenetic methods offer distinct advantages by allowing for the inclusion of multiple lines of evidence as prior probabilities, incorporating the uncertainty of model parameters in posterior probability estimates, and providing straightforward statistical comparisons of models via Bayes factors (BFs).

In this study, we analyse lexical data from 25 Semitic languages distributed throughout the Middle East and Horn of Africa (figure 1) using a Bayesian phylogenetic method to simultaneously infer genealogical relationships and estimate divergence dates of the Semitic languages investigated here. In order to calibrate a relaxed linguistic clock and increase the accuracy of our divergence date estimates, we use epigraphic data (text inscribed in stone or tablets) from extinct Semitic languages (Akkadian, Aramaic, Ge’ez, ancient Hebrew and Ugaritic) combined with archaeological evidence for the sampling dates of the epigraphic data (the time at which the materials were inscribed). We employ a log BF model-testing technique to statistically assess alternative Semitic histories and investigate different ways of modelling language evolution. Finally, we combine our divergence date estimates with epigraphic and archaeological evidence from all known Semitic languages to create an integrated model of Semitic history.

Conclusion

We used Bayesian phylogenetic methods to elucidate the relationships and divergence dates of Semitic languages, which we then related to epigraphic and archaeological records to produce a comprehensive hypothesis of Semitic origins and dispersals after the divergence of ancestral Semitic from Afroasiatic in Africa (figure 1). We estimate that: (i) Semitic had an Early Bronze Age origin (approx. 5750 YBP) in the Levant, followed by an expansion of Akkadian into Mesopotamia; (ii) Central and South Semitic diverged earlier than previously thought throughout the Levant during the Early to Middle Bronze Age transition; and (iii) Ethiosemitic arose as the result of a single, possibly pre-Aksumite, introduction of a lineage from southern Arabia to the Horn of Africa approximately 2800 YBP. Furthermore, we employed the first use of log BFs to statistically test competing language histories and provide support for a Near Eastern origin of Semitic. Our inferences shed light on the complex history of Semitic, address key questions about Semitic origins and dispersals, and provide important hypotheses to test with new data and analyses.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2839953/?fbclid=IwAR0QqjE32tug1gdyhmrfgV8fphXz_OiAFS_d9p0mOGjUGvrpFPdbodh7Lcw

Map of Semitic languages and inferred dispersals. The locations of all languages sampled in this study, both extinct and extant, are depicted on the map. The current distribution of Ethiosemitic languages follows Bender (1971) and distribution of the remaining languages follows Hetzron (1997). The ancient distribution of extinct languages is also indicated (i.e. Akkadian, Biblical Aramaic, Ge’ez, ancient Hebrew and Ugaritic; Bender 1971Hetzron 1997). The West Gurage (Chaha, Geto, Innemor, Mesmes and Mesqan) and East Gurage (Walani and Zway) Ethiosemitic language groups in central Ethiopia are depicted as two combined groups. The map also presents the dispersal of Semitic languages inferred from our study. An origin of Afroasiatic along the African coast of the Red Sea, supported by comparative analyses (Ehret 1995Ehret et al. 2004), is indicated in red, although other African origins of Afroasiatic have been proposed (e.g. southwest Ethiopia; Blench 2006). The assumed location of the divergence of ancestral Semitic from Afroasiatic between the African coast of the Red Sea and the Near East is indicated in italics. Semitic dispersals are depicted by arrows coloured according to the estimated time of divergence (see coloured time scale at top of figure), and important nodes from the phylogeny (figure 2) are placed on the arrows to indicate where and when these divergences occurred.