Tuesday, April 25, 2017

Detail #338: A Voice - Dereflexive

Consider a language where the only pronominal way of distinguishing third persons is the distincyion between reflexive and regular third persons.

In cases where only one third person is prominent, this is not widely used but sometimes the distinction is used outside its origin in reflexivity.

Here, we can consider a situation where a  basic voice marker and the reflexive marker - be they of whatever kind you want - combine forces to form a "dereflexive" - a voice that lacks a proper subject, but instead has a reflexive marking that is the real subject of the thing.

Monday, April 17, 2017

Interrogatives in Sargaĺk

[This post was accidentally deleted, and retrieved from the LCC aggregator]
The interrogatives in Sargaĺk have a few interesting properties, and there are also both gaps and additions in the case system that differ from the case system elsewhere in the language.

Two pronouns correspond to English what: səre and bəre. səre is for count noun-like things, bəre for mass noun-like things. Both lack the pegative form, but səre has several additional locative forms, and the bəre has an ergative form, exceptionally enough. Səre invariantly takes masculine case morphology, even when being a determiner for a feminine noun. Bəre invariantly takes feminine case morphology.

The additional cases for səre are allative -lu, illative -li, elative -rsas.

For persons, the interrogative pronoun is t'əre. T'əre has an accusative form, t'əra. It can take feminine case markers when the answer is assumed or required to be female. The female accusative, t'ərat, is falling out of use in favour of the absolutive. The feminine absolutive is t'əri.

For questions such as 'which X', the pronoun is suffixed to the noun. Otherwise interrogative pronouns are the first element of the NP, or even fronted to sentence-initial position but possibly leaving the NP behind. Usually they are in-situ, though.

A fourth stem with only two forms - the absolutive and the instrumental-comitative - zəre, zərmai. The first is basically a way of asking someone what they think or what they'd say, the second is the main way of asking for a speaker to repeat what he said because you didn't hear. Zərmai is seldom used in any actual phrases, but as a stand-alone word. Essentially zər- sort of is an interrogative for elements of the set of possible utterances.

The interrogative root zər- appears in derived verbs and nouns and adjectives, in ways that parallel the other interrogative pronouns. More about those in a later post.

The Interrogatives in Sargaĺk

In Sargaĺk, there are four interrogative pronouns -
t'əre: who
səre: what (for count-noun-like things)
bəre: what (for mass-noun-like things)
zəre: what (for utterances and thoughts)
These sometimes appear in a variety of verbs, adjectives and nouns.
The interrogative pronouns are inflected for case (and number), but t'əre is most often masculine except if a) the expected answer is feminine, or b) the answer is required by the speaker to be feminine. The three others are always inflected in the masculine except if they are adjectives; as adjectives, they are generally suffixed to the noun.


Adjectives are generally formed from other stems by adding suffixes. These suffixes further are inflected for case. 
brəsep - full (of liquid or indifferentiable mass)
zrəsep - full of things to say
t'rəsep - full of people
The suffix -sep means 'full of' or 'saturated with'. The -e- turns to -ə- when case suffixes are added.
sərkuy - 'whatless', insignificant
bərkuy - 'empty'
zərkuy - 'silent' but also 'unthinking' depending on context
t'ərkuy - 'one of a kind' (of a person)
The meaning of -kuy generally is '-less'.  -y- disappears before consonant-initial case suffixes.

From these, nominalizations can be formed, but the usual Sargaĺk discourse structure seldom calls for abstract nouns like '-lessness' or '-fulness'. -kir, however, is the usual abstract nominalization for adjectives: brəsefkir: fullness, t'ərkukir: quirkiness, zərkukir: silence, stupidity.

These can be used with appropriate case inflections to signify 'a X one', including the uninflected form for the absolutive case. 'Brəsep' thus can also signify a full container, 'zrəsep' a person with an issue to speak of, or a story-teller, or somesuch, and a t'rəsep can be a full house or a legion or anything like that.


There are only a few derivations from these that give nouns without an intermediate adjective or verb in the derivation chain. Three primary examples, however, are
srənki (f) - a question (as to what (səre or bəre))
t'rənki (f) - a question (as to whom)
zrənki (f) - a question (as to what the listener is thinking)


Verbs for asking are obvious contenders for this, and include
zrənoj, t'rənoj, brənoj, srənoj
Brənoj and srənoj both are used when the question pertains to time, place, etc, depending on the size and type of the expected answer: spans of time or large locations or maritime locations often are asked for with brənoj, specific days or times of day or weeks or months are asked for with srənoj.
There is also a verb k'yenoj which signifies asking a binary question. K'ye also is the particle that indicates tag questions, and can be initial or final in the tag question.

Other verbs for asking specific types of questions also exist, such as
bnaroj - ask for permission
The idea that asking for whom is somehow the primary type of question can be found in the following verbs, which can refer to any kind of question:
t'rəgrošaj: to overwhelm with questions
t'rəkoŋpoj: to ask questions with the intention of misleading the listener
t'rəroroj: to ask stupid questions
t'rəksturij: to ask a question without an intention to listen to the answer
t'rəksomaj: to ask the same question repeatedly
t'rəparuj: to ask a question in order to embarass someone

Thursday, April 13, 2017

Detail #337: A System for Encoding Numbers

Consider a positional system of numbers based on some form of ordinal thinking. We assume for now a decimal system.

1 is really the first number in the first decad in the first centad in the first millenniad ... ... this means 1 is really ...1111, but we omit leading ones and thus obtain 1. The range ten to nineteen is the second decad. Thus ....11121, ...11122, ...., or as we'd rather have them: 21, 22.

I am not particularly interested in forcing a particular base onto this either, any integer would do... it's just that I want a system where you get the following kind of pattern, given that Z = the base (which also needs a symbol of its own)


We don't need a zero, since we're not interested in those at all: the second '1' may very well come directly after the first '7' for all I am concerned, as long as the pattern is kept intact.

This is fairly similar to bijective numeration in some way, but adds the twist of being slightly off.

Fun thing: there's always an infinite string of 1s to the left of any 'regular' number. One could, however, imagine exceptional numbers where, for instance, there's an infinite string of some other numbers to the left, or an infinite regular pattern (e.g. ...123123123), or even an infinite irregular pattern (reverse your favourite irrational number and drop the decimal mark).

Challenge: develop easy rules for arithmetic for this, without involving conversion back to and from regular numbers.

Saturday, April 8, 2017

Conlanger Lore: Reasoning about Grammar

In part, this intermittent series of posts will deal with reasoning and knowledge in linguistics, when applied in such an unusual way as conlanging is.

One notion that forms part of the backbone of conlanging thought is the idea that we can just apply reasoning at a very basic level to reach conclusions about typology.

Consider, for instance, pro-drop. Common wisdom is that pro-drop and verbal marking for subjects (and possibly objects) go together. Superficially, this seems reasonable, but we know there are languages that have subject congruence, but do not permit pro-drop. Likewise, we know there are languages that have pro-drop, but don't mark their verbs.

Common wisdom is that lack of case (and/or verbal marking) forces word order to be fixed. But many languages without case marking permit some amount of word order rearrangement - Swedish, for instance, permits both SVO and OVS, without any explicit marking. This to the extent that I have been in situations where people have parsed what I have said (SVO) as OVS, because they have parsed contextual cues and salient features of the words involved in the utterances differently than I would have expected.

Yes, of course Swedish doesn't have strictly free word order - SVO still probably accounts for at least nearly the majority of utterances, followed by AVSO (where A = adverbial), followed by some oddities like VO (with omitted subject), followed by OVS fairly far down the line. The point is, you don't need the case marking to free the word order, what you need it for is to obtain very free word order, that is, word order where the different orders don't significantly differ statistically, and thus make it hard even to learn what is what. 

The point I am trying to reach is that ultimately, we cannot rely on ideas like "IFF X is marked in one way, then X can be left unmarked in other ways". Some languages simply structure their utterances in ways where who or what the subject is is irrelevant. In some languages, discourse tends to focus more on events than on people involved, in some languages the discourse is more interested in the who does what aspect of it. Much like some languages don't have tense. Further, with subjects and objects, oftentimes there is a significant bunch of additional knowledge the speaker and listener can be assumed to share, and this makes looking at whether the subject can be retrieved from other markings with regards to pro-drop, or whether the subject can be resolved from other markings with regards to case.

Thus, when we reason about language, we need to acknowledge that the actual form is not IFF X, then Y, but rather if any X out of a huge bunch of unknowns, then maybe Y.

Thursday, April 6, 2017

Detail #336: Modelling Restrictions on Compounds

Languages with compounds can have restrictions on what compounds are permitted. Describing such a system of restrictions in some depth could be a nice way of getting an impressive grammar done. Let us consider some ways of 'modelling' such a system. There's a difference between modelling and exhaustively describing, in some sense.

Giving an exhaustive description is possible for a conlanger: we inform the reader how it works and since we're the creators, our fiat holds. However, this might be somewhat uninteresting. Models are interesting in that they attempt to catch what happens, but might simplify some stuff and therefore be mistaken about things as well.

Given the natural scope of a language - spoken over generations, spoken by lots of speakers in varying relations with one another (all the way from family to have never ever interacted due to not even living in the same centuries nor even geographically all that close) it is likely for there to be a lot of variation in some parts of the language, and thus a model makes a lot of sense: it'll be wrong some of the time, but it'll catch the main traits of the system.

So, let's consider compounding and how we could model restrictions on it. First, we can recognize two types of edges of a compound: the left edge and the right edge. We can imagine a compound that does not permit any added morpheme to the left, and the same goes for the right. We call these 'right-saturated' and 'left-saturated' compounds. A compound that is saturated at both edges is simply saturated.

Another thing about modelling is that it'd be good if it also helped parse the compounds. Thus, a good model should tell us whether an element in a compound is a left-branch or a right-branch by looking at the word. It should even, probably, tell us whether two neighbouring elements are only "superficial" neighbours.
Left and right-branching

This gives our model some actual usefulness beyond its 'descriptive' power. Now we come to the nitty-gritty stuff. We of course want to have some way of quantifying whether a word accepts compounding. Let's simply use numbers for this - we could put it in a range [0, 1], where 1 is 'accepts compounding' and '0' is 'saturated' and values inbetween are probabilistic estimates as to how likely it is to accept compounding. So, we have, for any word, two values left and right ∈ [0, 1]. I'll write left and right as a single vector C = (x, y), where x is the left and y is the right edge. Subscript text comes in four varieties: full words represent themselves. Thus CDonauSchiff is the compound of Donau and Schiff. One-letter capital variables represent an arbitrary word. Small letters

Let us take two words, Donau and Schiff. These have associated vectors CDonau and CSchiff. The resulting Donauschiff too has the associated vector CDonauSchiff, which is a product of the vectors of the two elements. The interesting thing, of course, is the function that takes  CDonau and CSchiff and produces CDonauSchiff. It should be clear that order is relevant - we wouldn't expect Schiffdonau and Donauschiff to have the same properties. A very simple model would do something like this:
CEF = (El, Fr), where l and r as subscripts mark "left edge value" and "right edge value".
In such a model, the property at the edges carry on down. However, there's no a priori reason why ABl = Al and ABr = Br. In other words, there's no reason why a compound's edges should have the same compounding-properties as the element that occupies those edges - shoemaker needs not have the same left-edge property as shoe and right-edge property as maker - in fact, we'd sort of maybe expect, in English, that shoemaker would be more similar at the left edge to maker than to shoe (but not maybe entirely so). The compound is a new word, possibly a word of a different word-class (with regards to at least one of its parts), and thus it seems unjustified to expect the compoundability to be conserved at edges.

Thus we probably want a more detailed idea of what compounds are permitted - we might want both Cl and Cr to be vectors for different types of lexemes: verbs, proper nouns, nouns of different classes, adjectives of different kinds, etc. We might even want to go further: probabilities for specific inflected forms, probabilities for 'heavy' words vs. 'light' words, measured by their nested structure, etc.

Amyways, my next step in modelling this would be to come up with some kind of 'average' probability per word class pair, e.g. adjective-noun 75%, inanimate_noun-transitive_verb 80%. Once this is done, I'd make a weighted graph, where nodes are types of words, and directed edges are the probability of a word of one type compounding before a word of some other type. Self-cycles may exist.

Next, each lexical item in the conlang's lexicon would be given a run where a randomizer decides whether it'll accept a certain word-type as prefix or suffix with the probability given by that graph. The probabilities for the new word's edges would be based on some way of measuring 'saturation', which again creates a new thing we might need: a saturated word does not permit more suffixes, and this may happen even if there are non-zero probabilities going on for some level of the compound at the edges.

I am not going to present any algorithm for this now, this is basically an early rambling intended to come up with something.

Wednesday, April 5, 2017

Detail #335: Possessive Suffixes and Dative Congruence

Consider a language with possessive suffixes as well as an additional, lightly similar thing. We can imagine some interesting restrictions, though, and an immediate detour into that is called for about now.

In Proto-Finnic, the subject could not be marked with possessive suffixes; only the other cases permitted it. This is basically a nominative-alignment thing. Morphologically, this has left the trace that even subjects in Finnish, when marked by a possessive suffix, morphologically are identical to objects.

Now, the kind of suffix I am thinking of is an indirect object congruence marker. Thus, 'I gave a book to him' would come out as 'I gave book.[3sg ind. obj]'. Now, possessive and indirect object suffixes are in complementary distribution - they cannot cooccur.

However, we can imagine a weird situation where the indirect object congruence is permissible on intransitive subjects as well (at least for a short while, until the possessive marker catches up), for situations like 'the book is for him' and such.

For a short while, thus, the possessive marker would follow a nominative pattern, whereas the indirect object congruence marker would follow an ergative pattern.