Tuesday, January 29, 2013

An Old Classic


This is an old post of mine from another blog, slightly edited. 

Cartesian Product Conlangs


This will be familiar to anyone who has taken algebra 101, but since I am writing for an audience consisting of humanities hobbyists, I feel I need to introduce a definition here.

Definition: Cartesian Product

A cartesian product for a couple of sets A = {a1, a2, a3, ... an} and B = {b1,b2,b3,b4,...bm}, A x B is a new set consisting of all pairs {a1b1, a1b2, a1b3...a1bn, a2b1 ... anbm}

Now, what does this have to do with conlanging, and why do I complain about such things appearing in conlangs?

Well, one natural place of utilizing this set theoretical notation is in morphology (although it is also applicable in other places as well). Oftentimes, conlangs follow a formula along these lines: an agglutinating language, with, e.g. two-four numbers, four or up to a dozen cases, etc. and for every number, there's every case - or more generally, for every pair of accidents - every combination is possible.  With verbs, you get voice * tense * aspect * person * number * mood * other thingy * other thingy, with nouns you get case * number * gender * ... 


Naturally, this does occur in agglutinating languages, but what annoys me a lot is when people claim that, for instance, Finnish essentially is a Cartesian product. In Finnish, you have about 15 cases, two numbers, {3 persons x 2 numbers} + Ø (that is, unmarked) possessive suffixes (there already we find an exception from the cartesian product style - the plural and singular 3rd person possessive suffixes are conflated. This can be expressed as {{five distinct persons} + Ø}).

But this gives an incomplete and too regular picture of the Finnish 

  • Nominative and accusative are fully conflated in the plural; plural nominative morphology is used with personal pronouns (and the wh-pronoun kuka, "who") as a separate accusative case.
  • Non-partitive objects in the singular appear in either the accusative I or II, which in modern Finnish are identical to the nominative and genitive. As mentioned in the previous item, there is only one accusative in the plural
  • The singular nominative, genitive, accusatives, and the plural nominative and accusative are identical whenever they have a possessive suffix stacked on them (so, {cases} x {numbers} x {possessive suffixes} - ({sg} x {nom, acc, gen} x {five distinct persons} + {plural} x {nom, acc} x {five distinct persons})
  • The comitative lacks distinct singular forms
  • The comitative noun phrase never appears without possessive suffixes (but the adjectives in a noun phrase only take the case desinence -i-ne) - other cases get six forms, the full {[empty], 1sg, 1pl, 2sg, 2pl, 3sg/pl}. Comitatives lac [empty] in that set. 
Estonian presents another interesting uncartesian thing in its verbs - the negative. For normal indicative present and past verbs, you have personal inflections on the verb, but in the negative, you don't. (Finnish is similar, but Finnish indicates the person on the negating auxiliary, whereas Estonian does not). Estonian doesn't have three persons x two numbers x {positive, negative}, but rather ({three persons} x {two numbers}) + negative.

Finnish also seems to have various complications, in various dialects, where e.g. the plural partitive or the genitive might have several surface forms, which are used slightly differently (e.g. ratkaisuja vs. ratkaisuita, or elukoita, elukkoja) and some nouns have more than one essive form (lapsena, lasna) which can be used in slightly different contexts.

It is not hard to come up with any number of other examples, c.f. Russian tenses. Russian has a rather natural verb system with basically three tenses (present, past and future), two aspects (perfect and imperfect), three persons, two numbers, three genders. The cartesian product would consist of 3 x 2 x 3 x 2 x 3 forms, which is 3^3*4 = 108

However, the numbers pattern more like genders, as the genders are not distinguished in the plural (i.e., no cartesian product gender x number; Polish does this even worse, by distinguishing part of one of the genders in the plural and conflating the rest into one; Russian only does this as far as nominal case morphology goes), so let's for the sake of honesty instead simplify this to
{3 genders + plural} x {imp, perf} x {pres, fut, past} x {three persons} - 72 forms.

What Russian in reality has, is:

  • imperfect past marks no person, only gender/plural
  • perfect past marks no person, only gender/plural
  • imperfect future marks person and number, no gender
  • perfect future marks person and number, no gender
  • perfect present doesn't exist at all
  • imperfect present marks person and number, no gender

This gives a total of 1 * 4 + 1 * 4 + 1 * 3 * 2 + 1 * 3 * 2 + 1 * 3 * 2 = 24 distinct forms. Yes, cartesian products do occur everywhere - basically anywhere you see multiplication above, there is a cartesian product, but this cartesian product is limited to that particular context only and does not go on exponentially combining with other cartesian products - the language isn't just one huge cartesian product.

Generally, a cartesian product conlang will be more boring because:

  •  it's predictableYou see the list of distinctions, and you get the full idea of how it adds up, and no fun quirks to think about
  • seeing huge paradigms where all the features are predictable is boring
  •  it does not really permit for much creativity - you're letting a mathematical operator do the creation for you, rather than being the conductor of an orchestra of operators that add and multiply things in various ways
  • reading the grammar of yet another conlang that is just A x B x C x D for verbs and M x N x Y x Z for nouns ... is not that exciting really. Got that already?

Languages often fail to distinguish something somewhere, and this removes entire rows or columns out of a regular table of combinations. E.g. English fails to mark person on past tense verbs (with some exceptions). These failures bring a language to life in a way that this huge regular table doesn't.

Another example: in English, some nouns lack singular forms.
Another example: in Russian, in every gender, some cases coincide - the feminine merges dative and locative, the masculine either genitive and accusative or nominative and accusative, the neuter merges accusative and nominative, the plural merges genitive and accusative.
Another example: in Russian, for a few masculines, the locative is split in two different subcases, the -e and the -u case, where the -u case is used with adpositions when the semantics of the situation is concrete)

All of this helps create an impression of realism. Sometimes, the conflations might be rather random - having appeared through sound changes - or reflect the historical grammatical background of a construction or something about the distinctions themselves.

The latter is the case with the lack of one entire column in the Russian tense/aspect combination - present perfect just doesn't make much sense in light of the Russian semantics of tenses and aspects, and what's formed using the same tense morpheme as the present imperfect, but with perfect aspect instead is parsed as future perfect, and there's no need to use the synthetic future that the imperfect calls for.


I will quote one of Tom H Chappel's posts from the ZBB on how to, at least to some extent, avoid 'cartesianness':


  •  Not every root verb needs to have all of the cells in its coungugation filled in. Not every root verb needs to have all the distinct cells in its conjugation be filled in with distinct-sounding surface-forms. Not every root needs to have the same cells of its paradigm filled in as every other root.
  •  Don't fill in a cell in a conjugation just because it is a cell in a conjugation. Instead, actually come up with a sample sentence in which that meaning is actually needed.
  •  Don't make two cells in a conjugation sound different just because they're two different cells. Instead, actually come up with a sample sentence in which both meanings are actually needed and it's important to tell which is which; and come up with a pair of sentences, one using one meaning and one usuing the other, in which it's important to tell which is which."


I would personally even go so far as to claim that even if you perceive a need for distinction between two cells, it's not necessarily the case that such a distinction actually is needed. Lots of Finnish verbs conflate present and past forms throughout the active voice non-negative paradigm, and I bet that would strike most of you as a necessary distinction, no?

Anyway, I have no good way of wrapping up this post so here goes.

[Slightly edited, and in for more editing; I'd like to add sources, and more examples to it, as well as more in-depth functional musings about it.].

2 comments:

  1. To illustrate non-cartesianness further, look at this list of declensions of the definite article in German:

    gend. no.: NOM, GEN, DAT, ACC

    masc. sg.: der, des, dem, den
    fem. sg.: die, der, der, die
    neut. sg.: das, des, dem, das
    all pl.: die, der, den, die

    As a beginning conlanger you may think that it's necessary that something as basic as case marking needs to be as unambiguous as possible to not be confusing. But in practice, even though German's (main) system of declining nouns for gender, number and case with articles is full of syncretism, this syncretism is not usually a problem in practice.

    Similar conflations can be found in the pronouns of Germanic languages (and Romance, I think, too?), where not every case, number or gender (or a combination of those) necessarily has its own distinct pronoun.

    ReplyDelete
  2. On the aspect of lacking part of the paradigm, French as a somewhat large class of defective verbs, which may lack anything from a few person*number combinations in some tense or a past participle, through entire tenses or moods, to everything but the infinitive (the literary verb "accroire" (to believe something wrong) cannot be conjugated at all and thus is only used along with other verb, eg: "faire accroire" (to make believe something wrong)).

    Meanwhile, several verbs have two possible conjugation patterns: the 1sg of the verb "s'asseoir", to sit, can be either "je m'assois" or "je m'assieds"

    More generally, Romance languages often have a subset of verbs which admit two distinct past participles, one corresponding to an archaic Latin form, and one regularised, which are often used with an aspectual nuance (French however tends to avoid that and either abandon those extra past-participles or lexicalize them as adjectives that can no longer be used in conjugation [which doesn't stop some people from saying "*j'ai dissolu" for "j'ai dissous" (I disolved, because while "dissolu" is not a past participle, it's still a valid word with a similar meaning)].

    ReplyDelete