Designing an international auxiliary language

Requirements of an IAL
A priori or a posteriori?
Phonology
Morphology
Syntax
Conclusion
Appendix: Phonemes in 25 major languages

Desigining an international auxiliary language (IAL, auxlang) is quite different from desigining a language as an art form (artlang). The latter kind of language design is largely a matter of personal taste, and no-one minds if an artlang uses difficult phonemes or complex grammatical rules; such things only make it more interesting. An auxlang, on the other hand, ought to be easy to learn and culturally neutral, yet powerful and clear in meaning. For example, my personal taste favours interdental fricatives and front rounded vowels, as well as a healthy admixture of exotic grammatical structures. I thus make ample use of these things in my artlangs. But an auxlang in which such things occur would be a bad auxlang because most people in the world would find it difficult to learn. If I was to design an auxlang, I'd avoid those beautiful but difficult features.

Requirements of an IAL

An IAL, ideally, ought to meet the following requirements:

Ease of learning and usage. The IAL should be as easy to learn and use as possible. This calls for a simple phonology and grammar.
Cultural neutrality. The IAL should not put any ethnic or cultural group at an unfair advantage or disadvantage learning it. But this should not take the form of a language that is equally difficult for everyone.
Full expressive power. The IAL must not be a crude jargon that suffices only for the expression of ideas as simple as 'How much is the fish?'. It must be adequate for all domains of international communication, from tourism and trade to diplomacy and science. It must thus be a fully developed literary language in which legal documents and scientific papers can be written. (This raises the question whether an IAL ought to be fit for writing fiction and poetry in it. In my opinion, this is secondary; but on one hand, a corpus of good poetry and fictional literature would help popularizing the language, on the other, any language than can handle the more sophisticated modes of international communication most likely can also handle fiction and poetry sufficiently.)
Clarity of expression. The meanings of all words and grammatical constructions of the IAL ought to be clearly defined, in order to avoid ambiguities and misunderstandings. Homonymous words (words of the same form but with unrelated meanings) and grammatical ambiguities are to be avoided.
Computer tractability. It is at least highly useful if the IAL can be easily processed with computers in order to extract information from texts (a matter which had been a non-issue for past generations of auxlangers, when computers were not invented yet or used for numeric applications only). This calls for a simple, consistent and unambiguous syntax and morphology. However, the amount of complexity computers can handle has been rising rapidly within the few decades computers have been with us, and is continuing to rise; and a language that is easy to use by humans probably is easy on computers, too.

These requirements in part conflict with each other. A language that puts no-one at advantage or disadvantage over someone else would probably be equally difficult to learn for everybody. Ease of learning alone would be achieved by a simple jargon, which, however, would have limited expressive power. Any language that can be used for legal documents, scientific literature or other sophisticated modes of communication will show some degree of complexity. And what is simple for computers is not always easy to learn by humans!

A priori or a posteriori?

The first question in designing an IAL is whether the language is to be designed a priori, i.e. entirely independent from existing languages, or a posteriori, i.e. based on words taken from one or more existing languages. An argument for an a priori design is that it is (or at least could be) culturally neutral; a point for an a posteriori approach is that the language might be easier to learn because many words are already familiar to the learner. Especially, international terms such as telephone or automobile would be taken over into the IAL.

Philosophical languages - the wrong track

Most a priori auxlangs are philosophical languages. In fact, there are so few non-philosophical a priori auxlangs (opposed to artlangs, where non-philosophical a priori designs are common) that auxlangers use both terms interchangingly, even though they do not mean the same. (This scarcity of non-philosphical a priori auxlangs is probably due to the fact that auxlangers who do not take words from existing languages tend to be as systematic as possible in devising words and thus create philosphical languages.) The term 'philosophical language' was coined in the 17th century, when such schemes were popular, and it is based on the broader sense the word 'philosophical' had back then: it is to be understood as 'scientific language'. What it actually means is that the lexical roots of the language are constructed systematically, letter by letter (the 17th-century scholars thought in letters rather than phonemes, and most of today's inventors of philosophical languages do so, too, being bad linguists almost without exception) along a general taxonomy of ideas. Thus, the word for 'horse', for example, would be made up of elements meaning 'animal', 'mammal', etc.

But the inventors of philosophical languages are on the wrong track. The arbitrariness of the relation between sound and meaning in human languages is not a bug, it is a feature. And as such, it ought to be present in an IAL. It allows for keeping semantically closely related words clearly distinct, and allows for adding new words freely and easily. In a philosphical language, the words for 'horse', 'donkey' and 'zebra', for example, would be the same except for the final letter. That is just asking for confusion. The words for 'apple' and 'pear' would be equally similar, and so on. Another point is that a philosophical language allows neologisms only as much as they fit into the taxonomy. If a category turns out to have more subcategories than expected, the system might simply run out of letters to label the subcategories (though this can be remedied by using some kind of 'escape sequence', but such tricks will always be clumsy).

Furthermore, it turns out that the underlying taxonomy is just one of many possible taxonomies, and the mapping of taxonomic categories to sounds is as arbitrary as the taxonomy itself. The frequently cited examples of animal and plant names are from one of the very few fields of discourse where a natural taxonomy is an attainable goal. Most parts of reality aren't that simple. And sometimes, a taxonomy of ideas may be toppled by the advancement of science. John Wilkins, for example, classified comets as fire phenomena back in 1668, when no-one really knew what comets are - today, we know that they consist mainly of ice. Philosophical languages thus fail to defeat arbitrariness, they just make things more complicated, and may grow obsolete.

Philosophical languages are only the most blatant outcome of a flaw of reasoning that many auxlangers are susceptible to: namely, that extremely 'logical' and 'regular' systems are the best. This, however, is not true. For example, one might argue against the 15-consonant system I give in the phonology section that it contains gaps: the series of velar consonants lacks a fricative and a nasal. But I have excluded /x/ and /N/ because many languages lack these, and /x/ is considered difficult by many. This is an example where regularity gets in the way of ease of learning, and there are others. Philosophical languages are the most extreme case of this.

Logical languages are also unsuitable

The disciples of logical languages (loglangs, such as Loglan or Lojban) often propose using such a language as an international auxiliary language. However, loglangs are poorly equipped for this purpose. The loglangers tend to overlook the simple fact that language and formal logic do not serve the same purposes. Language is not primarily about making propositions that can be mathematically proven or disproven; its purpose lies in communicating ideas and emotions. There are many facets of language use which logical languages do not cover well. Language is language and logic is logic; they are different things.

It is also such that logical languages are very difficult to learn and use. Most people are not acquainted with the intricacies of formal logic; they cannot be expected to learn it in order to learn a language. I once tried to learn Lojban; I quickly abadoned that attempt because I simply could not understand how the language works. The language does not even have the same kind of parts of speech as human languages have. Instead of nouns and verbs, phrases and clauses, Lojban has things bearing such fanciful names as brivla, cmene or selbri. Those are Lojban words which cannot easily be translated into any other language; it tells a lot that Lojbanists use these words rather than English equivalents when talking about Lojban in English. I haven't met the same difficulties in any grammar of a natural language, no matter how exotic. Logical languages may be absolutely neutral - but only because they work in a way completely different from how human languages work, and are thus exceedingly difficult to master.

Closed vocabulary schemes are misguided, too

Most "philosophical" languages have closed vocabularies, i.e. there is a small, fixed set of words, and new words cannot be created, but must be replaced by circumlocutions. Closed vocabularies, however, are not limited to that category. The most well-known closed vocabulary scheme is Basic English by Ogden and Richards, who claimed that they could reduce the vocabulary of English to just 850 items. But as Mark Rosenfelder wrote in his Language Construction Kit, "Ogden and Richards cheated". As said above, they resorted to circumlocutions. Closed vocabulary schemes are not easier to learn than natural languages, which always have open (i.e., extensible) vocabularies: you have to learn thousands of circumlocutions instead. They merely make language usage more clumsy. The large and indeterminate size of natural languages is not a bug, but a feature. Reality is too complex to break down into a few hundred "basic concepts".

A posteriori schemes and Eurocentrism

It is hardly surprising that the mainstream of the IAL movement has turned away from philosophical languages. There are still people around who propose new 'philosophical' schemes, but those are generally considered crackpots even among auxlangers. Since the 19th century, most auxlang schemes (especially the more successful ones) have been a posteriori. They borrow their lexemes from one or more natural languages. Most a posteriori auxlangs are based on the major European languages, and borrow international terms wholesale. This, however, goes at the cost of cultural neutrality: most a posteriori auxlangs are clearly Eurocentric. Non-European languages such as Arabic, Hindi, Swahili or Chinese are usually not represented (and the grammar usually follows western European models as well). It is also such that most 'internationalisms' are more or less restricted to European languages; non-European languages are more reluctant to adopt such terms, but use coinages from native material instead.

Of course it is possible to design a non-Eurocentric a posteriori auxlang. Just add more non-European words to the mix, and avoid being Eurocentric in grammar. However, the more non-European languages are included, the fewer words that are widely distributed can be found, and the amount of recognizable material from the viewpoint of any of the represented languages is likely to become so small that the language feels entirely strange, as if it had been designed a priori. But this is probably impossible to avoid.

The argument of scientific terminology

An important argument in favour of an a posteriori approach comes from terminology of science and technology. Scientific communication being an important facet of international communication, an IAL must have a complete terminology of all sciences. An a priori language, especially a philosophical one, would need a completely newly invented terminology parallel to the international terminology already in use, which is hardly realistic and would force scientists to learn lots of new words. An a posteriori language, in contrast, can simply adopt the terminology that is already in use. However, large parts of international terminology are restricted to European languages, while non-European languages often use their own coinages: the Japanese words for 'automobile' and 'telephone' are jidôsha and denwa, respectively; for 'cinema', Chinese uses a compound of poetic value which literally means 'electric shadowplay'. This shows that "universal" is a rather relative term.

Cultural vocabulary

Each culture has items specific to them, raning from food and clothing to abstract concepts. These items are usually referred to by words from the language of said culture. In English, for instance, we have such words as spaghetti, chop suey, kimono, nirvana and many others. Other languages use the same words. These words ought to be used in the IAL as well, rather than coining new a priori words for these concepts. This is another strong argument for an a posteriori design.

Proper names

There is another argument for the a posteriori approach which runs in parallel to the previous two arguments. An international auxiliary language ought to handle proper names without distorting them beyond recognition. There are many auxlang designs (especially a priori ones) which force proper names into a pattern in a way that they can no longer be recognized. This is wrong. It is certainly of advantage if proper names can be used the way they are: otherwise, the miserable learner of the language would have to learn thousands of new names for people, places and other items he already knows.

Phonology

It is generally agreed among auxlangers that a good IAL ought to avoid difficult phonemes and use a restricted inventory that does not pose much difficulty to the learner. Some auxlangers propose extremely small phoneme inventories that avoid such problems, but also restrict the number of possible syllables and are unsuited for a posteriori schemes because they require the words to be distorted beyond recognition.

Consonants

A minimal consonant inventory that probably poses no problem to almost anyone would be /p t k s n l/. (There are, however, a few languages which lack one or more of even these few consonants.) This inventory, however, is certainly too small for an a posteriori auxlang with recognizable international terminology. Thus, one should allow a few more consonants, as in the following 15-consonant inventory: /p t k b d g f s h v m n l r j/. These consonants are found in a majority of the world's major languages, with those languages lacking /v/ having the very similar /w/. This inventory is essentially that of Latin (a language which is reputed to be difficult to learn, but that is because of its grammar; Latin pronunciation is actually rather easy), and thus allows for most internationalisms to be borrowed without distorting them, but it contains some distinctions that might be difficult to speakers of some languages. These are the distinction between voiceless and voiced stops, between /p/ and /f/, between /h/ and zero and between /l/ and /r/. These oppositions thus should not carry a heavy load; there should not be any minimal pairs for them (i.e. pairs of words which differ only in a single phoneme along one of these distinctions).

Vowels

Similar considerations hold for vowels. All human languages apprently have /a i u/; these are thus safe to use in an IAL. The majority of languages have the five vowels /a e i o u/, which would be the system of choice for an a posteriori auxlang. As with the consonant inventory, this is essentially the system of Latin, except that Latin distinguishes short and long vowels. Herein, the oppositions /e/:/i/ and /o/:/u/, as well as /a/:/e/ and /a/:/o/, should not bear too much load. (There are some important languages, such as Arabic, where only /a i u/ are phonemic.) More than these five vowels should not be used, especially no front rounded vowels, and no back unrounded vowels other than /a/, and no nasal vowels. Vowel length distinctions have no room in an IAL, there are too many languages that do not distinguish vowel length. Most diphthongs should also not be used, but /ai/ and /au/ should not prove more difficult than /e/ and /o/. Some people might find it difficult, though, to distinguish /ai/ from /e/ and /au/ from /o/.

Phonotactics

Which syllable structure constitutes no problem for anyone? The answer is simple: CV. However, with a small phoneme inventory, you don't have many CV syllables. The 15 consonants and 5 vowels proposed in the subsections above would yield only 75 CV syllables and only 5625 2-syllable words. That is too restrictive, at least in an a posteriori system. Thus, it is better to be more liberal here. CVC syllables ought to be o.k. if we restrict the choice of the final consonant to nasals and liquids. If we use Greek/Latin internationalisms, we should be just as liberal as we need to be to accomodate these. Most syllables, however, ought to be as simple as possible.

Suprasegmentals

Suprasegmentals include accents, intonation patterns and similar things. These show bewildering variety in the world's languages, and only some languages rely on them to distinguish meanings. They are difficult to learn (foreign accents are not called accents without reason - there's nothing in languages that foreigners get wrong more often than suprasegmentals), and that means that suprasegmentals in an IAL ought to be completely redundant. The meaning of a word should only depend on the sequence of consonants and vowels and on nothing else. It might be useful to specify a simple accent rule, but not to rely on it.

Orthography

The orthography of an IAL ought to be simple and straightforward. Using the Latin alphabet is prescribed by the currency this alphabet enjoys worldwide, including computer support. The ideal is to have each letter corresponding to a single phoneme on a 1-by-1 basis, and not to use any diacritics or special letters, because such complications cause difficulties with typewriters, print shops and computers. (Don't dismiss typewriters and lead-type printing as no longer relevant. These technologies are still in widespread use in developing countries. And not all computers support Unicode.) A simple 1-by-1 orthography requires that the language has no more than 26 phonemes, but an auxlang's phoneme inventory should be small anyway. (The diacritic marks of Esperanto have been the target of much criticism; the problem ultimately lies in the fact that Esperanto uses more phonemes than it should.) The values assiged to the letters should also not differ much from from those they carry in most other languages.

The inventory I have given in the subsections above, for example, causes no problems at all: it can be written using the letters a b d e f g h i j k l m n o p r s t u v with their IPA values, leaving six letters (c q w x y z) unused. (The unused letters may be used to create more familiar spellings, though.)

Morphology

Morphological typology

Linguists distinguish three morphological types of languages: isolating, agglutinating and fusional. Isolating languages have only uninflected words and particles; agglutinating languages have affixes which express a single function each and are added to the words without changing the root; fusional languages make use of multifunctional affixes, stem modification and similar things. In practice, the three types define the corners of a triangle, with most languages lying somewhere in between.

Of these types, the fusional type posits the most difficulties to the learner. It is thus to be avoided in an auxlang: IALs ought to lie somewhere on the isolating-agglutinating axis. The question is, how many grammatical affixes they should have at all. We shall see that there are good reasons to go for an isolating system: most inflections are either redundant, or can be replaced by free-standing particles.

Nouns

Nouns in an IAL should not be inflected for case. The accusative case of Esperanto is one of the features that have raised the most criticism. Many languages, including many of the world's major languages, have no case system, and their speakers tend to find the proper use of cases difficult to learn. Ido made the accusative case facultative, mandatory only if word order is reversed; most later IAL projects disposed of it entirely, and rightly so. Defendants of case marking in IALs usually claim that it was 'necessary' for poetry because it frees up word order. This argument is to be rejected for two reasons. First, the primary purpose of an IAL is not poetry, but international communication. There is nothing wrong with poetry written in an auxlang, but an IAL should not be optimized for poetry at the cost of learnability (although any language with sufficient expressive power for legal or scientific documents is almost certainly an appropriate medium for poetry, too). Second, free word order is not necessary for poetry as a whole, only for certain poetic devices commonly used in classical Greek and Latin poetry. Chinese has rigid word order and no case marking, yet it is one of the world's most poetic languages.

Another inflectional category of nouns is number. Most euroclone IALs keep the inflection of nouns for plural. There are, however, languages that do not mark plural on nouns, and there are languages where plural is not marked when the plurality of the noun can be inferred from the presence of a numeral or a word such as 'some' or 'many'. It is thus debatable whether an IAL should have a plural marker. It might be better to do without it.

Adjectives

In some languages, adjectives agree with the noun they modify in gender, number and case - in many others, they don't. This shows that such inflections of adjectives are entirely redundant. Thus, adjectives ought to be indeclinable. Another matter are degrees of comparison. These are best expressed using particles like more and most in English.

Verbs

Europeans tend to think it was natural to inflect verbs for tense, mood and the person and number of the subject. But the latter two aren't categories of the verb itself, and many languages do not mark them on the verb. They are completely redundant if subject pronouns are mandatory, and thus have no room in an IAL.

But there are also many languages that do not mark verbs for tense. They use temporal adverbs instead. German, for example, has a future tense, but it is hardly used in informal speech; they use the present tense instead. The same would work with the past tense: the sentence Yesterday I go to the church may be ungrammatical in English, but the meaning is clear. Tense, a category found in most euroclone IALs, is thus a dubious candidate for inclusion in an auxlang.

Some languages inflect their verbs for aspect (ongoing vs. completed action) instead of or additionally to tense. Many don't. An IAL should not have an aspect system.

Moods vary wildly from language to language, and thus should not be marked on the verb. It is better to use modal adverbs instead, which are easier to learn than modal inflection forms. The conclusion of all this is that the verb should not carry any inflectional markers at all.

Word formation

So we best do away with inflection. But what about word formation? Some auxlangers in the past have tried to work with a closed set of basic words, thinking they have covered all semantic primitives and no more words are needed. Basic English is the best-known of such schemes; most philosophical languages are like that too. But again, this is wrong. It doesn't work. Or rather, it only displaces the problem rather than solving it. Basic English uses phrases for the words that were done away with, and those phrases must be learned individually like the words they replace.

So an IAL needs derivations and compounds. Many auxlangs use schematic systems of derivative affixes, which often produce strange results. Word meanings cannot easily be fitted into a regular scheme the same way categories of grammar can be. Derivation is not the same as inflection.

A frequent problem with word formation is ambiguity. There are often word forms that could be analyzed in more than one way. A number of auxlang designs avoid this by means of a technique called self-segregating morphology: there is a rule the shapes of morphemes must obey, such that morpheme borders can always be unambiguously identified. A simple example is the following rule: each morpheme must begin and end with a consonant, but may not contain consonant clusters. Under this rule, everywhere where two consonants follow each other, and only there, is a morpheme boundary. There are of course other, often much more subtle, rules possible. The drawback of self-segregating morphology is that it is very hard to apply to a posteriori languages, as the morphemes must be designed for it. Among other problems, it either distorts proper names beyond recognition, or requires the usage of clumsy 'encapsulating' morphemes. Self-segregating morphology is also an 'unnatural' technique that is remote from the way natural languages work.

Syntax

As we have seen, a good IAL has as little inflection as possible. This means that grammatical relations are expressed by syntax. An isolating language cannot enjoy the freedom of word order found in languages like Latin or Greek. It must rely on word order to tell subject and object apart. A fixed word order is recommendable anyway, because an inverted order can be misleading even when case marking is present. There are six possible orders of subject, object and verb: VSO, SVO, SOV, VOS, OVS, OSV. Of these, the latter three occur only very rarely in the world's languages; apparently, there is a natural tendency to place the subject before the object. Of the remaining three, SOV and SVO are common, but VSO is rather rare. Apparently (according to typological surveys), there are more SOV than SVO languages, but it seems that the SVO languages are spoken by more people. This is because SOV is most commonly found in northern and central Asia, the Americas and Australia, in languages most of which have only few speakers. SVO is the predominant order in Europe as well as Africa and East Asia. Thus, the SVO word order is the preferable order for an IAL.

Syntax, however, is more than subject-verb-object order. It is generally about the relative positions of heads and modifiers. The verb is head to the object; nouns are heads to adjectives, genitives and relative clauses; prepositions are heads to noun phrases; verbs are head to adverbs. It is advisable to use a consistent head-modifier order, and as we have chosen SVO, this means that in an IAL, heads should generally precede their modifiers.

Conclusion

Designing a good IAL is no easy task, and it looks as if there is no single way to do it right. No matter what you do, objections can - and will - be raised against your decision. However, there seem to be more good reasons for a posteriori systems than for a priori designs. Most of the existing a posteriori IAL designs indeed come close to what seems achievable.

I, for that matter, don't think that I can come up with an auxlang that is substantially superior to such designs as Novial (my favourite) or Esperanto; and then there are languages such as English which compete with all those IALs - currently, it seems more likely that English will win out than even Esperanto. These and other reasons are why I prefer doing artlangs.

Appendix: Phonemes in 25 major languages

Source: Richard A. Morneau, Phonology for Artificial Languages.

As can be seen from this chart, most major languages have /p b t d k g f v s h l j m n r/, i. e. the 15 consonants proposed above. The most marginal of these are /h/ and /v/; those languages that lack /v/ mostly have /w/ instead. /z/, /S/ and /tS/ are also very frequent, but I have left these out because they all, being sibilants, can be difficult for some to distinguish from /s/. The vowels /a e i o u/ are present in most languages, while all other vowels are markedly less common.