Wednesday, April 30, 2014

A general consideration of language complexity, pt 1

There is a notion popular among some people with a reasonably wide-spanning knowledge in typology, syntax, morphosyntax, etc, that languages roughly are equally complex. On the other hand, quite the opposite is taken for granted among what I surmise to be almost all of mankind. Among political pundits, those who are generally suspicious about anything remotely looking like it might have occurred in the same room as 'political correctness', or are just generally suspicious of academia or even just skeptics on steroids, there probably further is a suspicion that the notion of roughly equal complexity is something bleeding-heart liberals have come up with in order not to have to admit that some languages are inferior.

Of course, one of these three stances may have some kind of merit, and I am going to present a hypothesis that I suspect might be fairly accurate. Let us first consider what grammar is. And for this consideration, we start out with a sort of overview of mistaken understandings of it, so that we can try and come up with a more sensible and accurate notion.

An authority-based notion

Early historical linguists seem to have believed that as civilizations were formed, there was work invested in giving the language a grammar. The notion these linguists really seem to have labored under, is that pre-civilized language lacked grammar, and that grammar only exists as a conscious product of language planning. Thus, uncivilized people lack grammar, or at the very least have very primitive grammar, and civilized people have a cultivated, artificial grammar. 

Oftentimes, an extra claim comes along: once the language has been designed and the culture has reached its apex, laziness sets in and the language starts deteriorating, shedding grammar as it goes along.

The problems with this are manifold:
  • It turns out that grammars have features that are so abstract, it's only in recent decades we've developed a descriptive apparatus for them - yet these features have been along for quite a while. These features do affect how people speak, yet no one understood these features even existed until fairly recently! 
  • It turns out languages have acquired new grammar even after the 'apex' of their culture, and even more so - without any actual planning going into it!
  • It also turns out we have better models now for understanding what goes on when grammar is generated.
The reason I consider this an authority-based notion of grammar is that it ascribes a kind of authority-status to those who, according to this theory, designed Sanskrit, ancient Greek, Latin, Old Irish, etc. Of course, there is no reason - even within this model - to think the designers met up to design it, they can rather have contributed consciously bit by bit to an ever-growing language. Thus, Ωαζισφασες adds a rule to Greek in 1000BC, and Σοανδσoν adds another rule 160 years later, promptly followed by ὑιζῆς and thus together they all contribute to building the great ancient Greek language. (For those who didn't get the joke-names, they're Whatshisfaces, So-and-son, and Hu-is-hes.)

A more abstract (sort of supernatural) notion

Some seem to just posit that grammar exists, and that utterances either conform to it or do not. This does not explain how this grammar exists or what form its existence takes - is it an angelic being in the platonic spheres or is it a law of physics or what is it? For most purposes - e.g. writing easily understandable texts - this is a sufficient idea! It is wrong, terribly so, but it works in many contexts. It does create a bit of a neurotic approach to correct grammar among some speakers (e.g. 'Is X a word?!') in contexts where understanding would not be hampered even the slightest, but this is also the case with the previous approach as well as the next one.

A clearly authoritarian view

There is finally the view that whatever some authority on grammar says is what goes and basically what grammar is. Thus, grammar correlates with what recognized authorities think. This view runs into problems when the authorities have inconsistent views - which is even possible for one single authority to have - or when we can show that a language adheres to some rule that no authority has ever described or decided. Such a rule definitely belongs to the grammar of the language, yet it cannot be accounted for by the authoritarian view. 

Another obvious disadvantage about the authoritarian view is the situation when a new construction is required because no means of expressing a certain thing exists - are all statements expressing such thoughts wrong until an authority has made a pronunciation on how to say it? 

A more general and rational view

There may be other views, as well as situations in which the authoritarian view at least makes sense - e.g. if you are trying to write or say something where conforming to a set standard is expected. However, this is a rather specific situation, and we know grammar exists independently of that kind of situation as well. 

First, I will go in for a very concrete view of what grammar is. Once the concrete view has been described, I will attempt to formulate models. It is important to realize that models are just tools; we have to use models, though, because of map vs. reality - reality is tricky, and we need to be able to focus on the relevant details in a way where the squishiness of the variables of reality can be partially ignored.

Let us consider what grammar does, where it resides, and finally what this implies as to what it is. When we speak, we have a huge set of somewhat generalized patterns. What does grammar really do? It is patterns we use when parsing and generating linguistic content. Depending on our level of analysis, it can be the patterns taken in isolation, or a description of the patterns and the machine that does the pattern-parsing.

Without these patterns, we couldn't parse statements, nor could we generate them. These patterns also include what we could term statistical patterns. Such a pattern is 'in English, objects are very likely to come after the verb', or in most languages 'certain kinds of nouns are more likely to appear as objects than as subjects of this or that verb'. They are not rules that are etched in stone - sometimes, we will come across sentences that violate such statistical rules.

In fact, it will turn out that even those patterns we may figure are set in stone in fact vary from speaker to speaker. It will turn out that a full description of some given grammar may well be beyond reach. For the moment, we will assume that a grammar of a language is some kind of weighted average of the grammars residing in the minds of the speakers.

Thursday, April 24, 2014

Revisiting Bryatesle: Phonology and nouns

I have now postponed updating Bryatesle for almost a decade. Since it was (and still is) a language with some potential greatness, I will now go on developing it on this blog. But first, an overview.


Bryatesle basically has these vowels, all showcasing length distinction:
i  ɨ   u

The following consonant phonemes are present:
p b ɸ ʋ m t̪ d̪ s̪ z̪ l̪ n̪ t̙ʲ d̙ʲ rʲ̙ ɕ lʲ̙ n̙ʲ k g x 
Orthographically, these are represented as p b f w m t d s z l n ţ ḑ ŗ ş ļ ņ k g h. Some morphophonological alterations happen between some of them - especially, short words tend to avoid having two dental or two postalveolar consonants of the same kind of articulation close together.

For the moment, allophony will not be presented in any greater detail, nor will phonotax.


The Noun

The noun, like in Indo-European languages come in three genders - masculine, feminine and neuter. This is fairly boring, but opens some fairly intriguing possibilities. Bryatesle's case system consists of two partially parallel systems. For lack of a better terminology, I have decided to call these primary and secondary cases.

Primary Cases

The six primary cases line up in a kind of two-by-three system: nominative vs. accusative, dative vs. ablative, vocative vs. exclamative. 

The nominative and accusative distinguish subjects from objects. Nominal complements of verbs also agree with the subject or object noun that they pertain too. Neuter nouns do not distinguish nominative from accusative; however, neuter subjects of transitive verbs take a masculine nominative determiner, essentially forming a kind of periphrastic ergative.

The dative and ablative both figure in ways quite typical of Indo-European languages; 

The vocative and exclamative form a pair of opposites. The vocative is used to attract the attention of someone - usually a person - to the listener, the exclamative on the other hand draws the attention of the listeners to a person or a thing. These are not all the uses of them - they also have uses that verge onto information structure, pragmatics and social interaction.

Secondary Cases

The secondary cases come in a much more haphazard bunch. Some agglutinate, some are more fusional. The full list consists of possessed, definite, partitive, the reciprocal object, the secondary subject, negativity agreement and suggestion marking. The possessed secondary case marks nouns that are the property of, or otherwise in some significant relation to another noun, usually some salient argument in the phrase or a nearby noun in the dative. The exact uses of the partitive will require a post of its own, as will the reciprocal objects and secondary subjects. Suggestion marking is often used with the exclamative, but also with some other cases. Its main role is to communicate that the statement is a suggestion. The definite case is fairly similar to English 'the', but with more syntactic restrictions.


Bryatesle has singular and plural numbers, and to a small extent an undefined number. This final number is highly restricted in usage - mainly appearing in compounds. The undefined number distinguishes two cases, nominative and non-nominative.

Morphological tables will appear at some point.

Sunday, April 20, 2014

Ćwarmin: The Noun morphology

Note: this post is prone to corrections as I find errors or change details.

Ćwarmin, as previously stated, has a multitude of cases. It also has three numbers, and a slight definiteness distinction.

With a few exceptions, all nouns use the same suffixes for the cases. The main exceptions occur in some family member terms, some animal terms, some child-speak terms, pronouns,

The indefinite nominative does not really have a particular suffix, although a fair share of nominatives do end in some similar suffixes, particularly -a or -kem. Not all nominatives ending in -a have an -a suffix though - the difference being that when inflected for other cases, the nominative ending is removed, and the non-suffix a is not.

The singular idefinite and definite are somewhat irregularly formed, as is the indefinite plural.



Nominative Complement
The nominative complement ends in -əcə|-ace or -əmćə|-amce in the singular. In the plural it affixes -ce to the plural marker -il|-ul. Paucal forms are mostly identical to plural, although a few pronouns and adjectives have a separate form formed by -imce|-umco.

No definiteness distinctions are made. (A few words exceptionally use the definite nominatives as complements too - jehir, 'king', among them, obtaining a situation where only the indefinite complement forms are distinct from the regular nominative for those nouns, but also where definite and specific nominative forms are used as complements.)



Accusative Complement
The accusative complement does not distinguish definiteness, nor does it have any paucal forms.
The singular and plural forms are -itće|-utćo, -wiće|-wuću

The same situation that applies for nominative complements obtains here, but both definite and specific accusative and nominative forms are attested as object complements for those.

Reflexively Possessed Accusative
This distinguishes singular, plural and paucal, but not definiteness.
-sin|-sun, -ijn|-uwn, -emwin|-umwun

Only has definite and specific forms, but distinguishes all three numbers:
Definite: -itite|-ututa, -ijte|-uwtu, -imməte|-ummota
Specific: -itəś|-utoś, -ijəte|-uwotu, -imməte|-ummota



General ablative
The general ablative distinguishes definiteness (but not specificness), and two numbers.


{towards, from, at}*{in, on, by} / {(towards, by)}
These do not distinguish definiteness, and paucal is not distinguished in the 'from' row.
The towards-cases are formed by combining the corresponding dative with -ka, -mu, -le (often realized -ek:a, -em:u, -el:e). The 'at'-cases are formed from the accusative by the same suffixes. The final set are obtained by affixing -ka or -mu to the general ablative. Vowel harmony is not as consistent in these as in the other case suffixes, but a tendency towards harmonizing them does exist. In harmonizing varieties they often come out as -kə|-ka, -mi|-mu, -le|-la.

The instrumental always is singular and does not distinguish specificness. Nouns that only have plural forms have an exceptional pseudo-plural instrumental.

Singular -ep|-ap
(Plural -erep|-orap)

The plural and paucal are merged in both comitatives. Unlike other mergers of plural and paucal, "morphologically paucal-like" forms are here used for plural referents. Two degrees of definiteness are distinguished, viz. indefinite contra definite (which incorporates specific).




The negative makes no definiteness distinctions.
Unlike other forms where the paucal-plural distinction is missing, the paucal here merges with the singular instead.
Singular-paucal: -istə|-usta
Plural: -itis|-utus

Marginal Cases
A few forms appear only with very specific nouns, and although they are used in case-like ways, are not really as important as the previously mentioned ones. Among these are a number of lexically limited locative cases. Another set are combinations of definiteness and number and case that usually do not appear in the language - some may actually have such forms extant for a limited number of words.

Most marginal forms that are not of the "unusual combination" type do not distinguish definitenesses.

Tuesday, April 15, 2014


What is the hardest computational complexity class for which a script for which parsing is a "complete" problem still be humanly readable?

Detail #85: Congruence blocking ... again!

In a language with some kind of noun classes and class congruence on verbs and adjectives, let the following circumstances block congruence on the verb:

  • subject and object complements trigger omission of class (and number) marking on the verb. Basically, the congruence migrates to the complement - but if the complement cannot mark class, the congruence marker will be entirely lost.
  • relative clauses where the relativized constituent is not the subject. However, complements still mark congruence.
As examples (with the prefixes te-, pa-, ku-, ri-, ne-), the following should illustrate a bit:
ne-solution is (a solution exists)
ku-man is ku-tall (the man is tall)
pa-boss pa-expressed his ri-opinion with te-clarity (the boss expressed his opinion with clarity)
ku-man REL ku-expressed ku-his ri-opinion is ku-educated (the man who expressed his opinion is educated)
ku-man is pa-boss (the man is a boss)
te-computer is from apple (the computer is ...)
ne-project REL pa-boss launched ne-it ne-will ne-reform pa-synergy
Forms with bold italic are main clause non-congruence verbs, due to complements. The two non-bold italic verbs are unexpected examples of non-congruence: existential is and is with non-congruent complement. ku-expressed is the normal form in subclauses, viz. when the relativized thing is also the subject of the subclause. Finally, the bold italic underlined verb lacks congruence due to the subject not being the relativized element.

Friday, April 11, 2014

Detail #84: Nouns secondarily used as designators of qualities of things

Consider a noun such as 'pear', and consider the form of the pear. Now, consider an abstract of totum pro parte, where 'pear' now also signifies 'wide part of, bump'. Imagine a language where a lot of things are understood both as referring to the thing as such as well as distinctive properties of the thing, even when these distinctive properties are on other things.

Thus, "the pear of the face" could be the cheek, whereas the "tree of the man" is his tallness.

Tuesday, April 8, 2014

Ćwarmin: The Verb

The Ćwarmin verb behaves fairly straightforwardly for an agglutinating language: it has a stem, to which suffixes and prefixes are added. Some morphophonological things do occur, though.

A few verbs:

to eat - sewkən to drink - itrin  to grip - brewən to reside, to live somewhere, to stay somewhere - twam to walk - rakam to read - xaukam to see - saulto talk - ragamto breathe - taucon

 The form given above is the infinitive. It is not quite similar to the English infinitive, in that it mostly serves as a gerund. However, for most verbs it is fairly close to the verb stem and thus it is a good place to start. -an, -in, -am, -en, -jul, -jig all are suffixes that carry some grammatical information, and some verbs can take several different ones when forming infinitives:
to think - hacan, hacam, hajul
Some slight change in meaning may occur, i.e. hacan signifies thinking about things, hacam tends to pertain to beliefs and convictions, and hajul denotes planning and such. Not all infinitives showcase such a variety of meaning, and the meanings imparted by the different suffixes seem somewhat inconsistent - there is but almost a pattern there.

Non-past (a.k.a. present)

The non-past is formed by removing the infinitive marker and applying the following suffixes:

The paucal has other uses than paucal number. -er, -iy, -in, -id, -iwe, -iru all cause some changes:  . -i, -in, -irre, -iwe, -iru cause changes along these lines:
sewkən  → sewcer, sewkə, sewkei, sewkəc, sewćin, sewćie, sewći, sewćiy, sewćirə
ragam   → raźar, rawo, rawwu, ragac, ragun, rawwu, ragu, raguv (rav), ragura (ragra)
taucon  →  taućar, tauco, taucou, taucac, taućun, taućuu, tauću, taućuv, taućura
-l- sometimes also turns into ź or w
saul   → saular, sauwo, sauwwou, saulac, sauźun, sauwwu, sauwu, sauwuv, saura

Immediate past

The immediate past is formed from the infinitive by use of the subject complement case. I.e. sewkən → sewkəmce, itrəmce, brewəmce. Further forming a personally inflected verb form for this is done for the singular and plural first and second persons, omitting the final e, to obtain sewkəmcer, sewkəmco, sewkəmcəc, sewkəmćin, but these forms are optional.

The object complement case is used in a slightly similar way for causative constructions.

Non-immediate past

The non-immediate past is formed by the suffixes -i(n)-| -u(n)-, sometimes omitting the final consonant of the stem: sewkir, sewkine, sewkinei, sewkinəc, sewkini, sewkini, sewkin, sewkinð, sewkinrə.

The use of the tenses will be described in more detail once the discourse particles are described, in combination with how they interact with case and tense. Omitting the person inflections reduces the certainty. This is utilized, although not mandatorily so, to mark inferred knowledge or knowledge by hearsay.

Passive voices

The regular passive promotes an object to subject status and marking. The agent can be present in the general ablative case. For those verbs that permit the regular passive, person congruence is lost. The suffix is replaced by -aśp. The indirect object being promoted to subject by a different passive is permitted for all verbs, and is formed by replacing the person marker with -əźbel|-ažbul. The past passives are more complicated and they use various weird constructions.


Combinations of the basic cases and the infinitive serves a lot of the participle-like needs of Ćwarmin. However, a large number of adjectival derived forms that may reasonably be called participles exist. Their formation and use is not described here, and is subject to great lexical variation. Ćwarmin resides on the edge of a sprachbund where this is one of the central languages. The preponderance of pseudo-participles is not the only shared trait, but their use is much more limited to adjective-like constructions in Ćwarmin.