Historical Conlinguistics are hard. I am currently trying to figure out Proto-Ćwarmin-Ŋʒädär(-Dagurib), so that I can get on with developing both (or all three) in greater detail without fearing that I'll break "historical compatibility".
I'm currently trying to come up with some neat ways of connecting these two vowel systems:
Front | Back |
Unrounded | Rounded | Unrounded | Rounded |
i | ü | <ı> ɯ | u |
e | ö | <ə> ɤ | o |
| ä |
| a |
Notice the orthographic reform regarding how /ɯ/ and /ɤ/ will be written. This in part to reduce visual conflict with regards to /ɣ/
and
Front
& Centre | Back |
i |
| u |
e | ə | o |
|
| a |
Currently, I'm thinking that Proto-CŊD had a vowel system that is slightly richer than Modern Finnish, but with a very similar vowel harmony:
Front
(Neutral) | Front
Round | Back
Round |
i | ü | u |
e | ö? | o |
ɛ |
| ɔ |
| ä | a |
Here, we get a number of mergers; Ŋʒädär pulls /i e/ to /ı ə/ in the presence of /u o ɔ a/ and also in the presence of certain back consonants. I might just let ü and ä cover a greater area of the articulatory space, though, making ö disappear as a phoneme entirely.
However, all this mucking about with historical conlinguistics leads me to thinking about some epistemology of historical linguistics things: I don't want this proto-language to be entirely by fiat, but I want it to be close enough to a realistic reconstruction of two conlangs. See, there are methodological things with regards to historical linguistics that are not all that obvious, and which affect my work on this proto-conlang. I want my reconstruction to suffer from the flaws that real reconstructions must suffer from by the nature of the very methods used.
I recall a while ago a discussion on a facebook conlanging group, where someone - I don't recall who - pointed out, to a newcomer, that in historical linguistics, the unit we deal with is the phoneme. I was under the same impression for the longest time, but had gotten the opposite stance pointed out to me. At this point I decided it was time to think a bit about what was the more reasonable position.
It turns out that when we look at a language in its modern, living form, we generally have an idea of the phonemes involved - although even there, they may exist unclear spots. (Say /ɨ/ vs /i/ in Russian, or maybe which exact sets of fricatives form phonemes together in Standard Swedish).
It seems, however, that sound changes don't operate on the level of phonemes all that often, but more often hit phones or features. Thus, while undoing the sound changes, we end up with the particular phone or cluster of features that the proto-language had.
Given that lots of vocabulary gets lost between the proto-language and its descendants – for proto-Uralic (including Samoyed), about 200 lexemes can be reconstructed – we don't really end up with a lot of vocabulary to work with.
This is probably only a fraction of the size of the words of the language; potentially, several hundred more of the words of the proto-language may still have extant descendants, but if a word only has cognates in one branch of descendants, we cannot know whether they were part of the proto-language (and even if there's cognates in two branches, we might not recognize them as such, if one or both sets have gone through very crazy reductions or semantic changes or whatever). Sometimes, we may have reason to suspect that some word has been in the proto-language, but also have reasons to suspect that its being present in several branches is due to early loaning between branches - failure to conform to some sound changes may indicate such a thing. If a word just happens not to have been hit by any early sound changes in either of two branches, knowing whether it's got a shared origin, or has been loaned can be difficult as well. Each of these introduce uncertainty.
So, we have few vocabulary items to work with. How is this relevant for phonemes vs. phones? Easy! We test whether two phones belong to the same phoneme by minimal pairs. Once you've shedded 90% of the vocabulary or more, coming up with minimal pairs is not necessarily possible at all – and the opposite, failing to find minimal pairs is clearly way less significant.
We may have words where k and kʰ appear, and they might even appear in words that suggest complementary distribution - but given that k maybe appears in 8% of syllables, and kʰ maybe in 8%, we find that for some string of letters - ....kʰ..., WonsetkWcoda, – where the onset and coda only are the relevant part of a syllable (but here, onset and coda mean 'goes before' and 'goes after' k/kʰ, not 'onset of syllable' vs. coda of syllable'), we could expect a minimal pair for 0.08² of syllables – 0.64% of syllables will provide evidence for that particular minimal pair. If we've lost 90% (which is a low estimate) of the vocabulary we can probably just cheat a bit and also say we've lost 90% of the syllables. It's quite probable we've also lost all the places where the two formed minimal pairs. However, we cannot decide whether such a thing were lost or not unless we find evidence of such a thing! The probability will vary with the frequencies of the phonemes, obviously.
Obviously, we have a few extra things to note:
- Since there's lots of phonemes in a language, even if the likelihood of a minimal pair for any specific pair of them might be low, several phoneme pairs may have minimal pairs coming up.
- But since our reconstruction might be flawed – our methodology might make us favour certain other sounds in our reconstructed roots, which might make it likely for us to create a minimal pair that never existed in the first place.
- For reconstructions that are not very deep in time - e.g. Proto-Germanic or Proto-Slavic or the like, we may very well get sufficient vocabulary to be able to come up with sufficient minimal pairs.
- We might be able to somehow use our knowledge of the sound changes from phones to phones and our knowledge of the phoneme systems of the descendants to make well-informed guesses about the phoneme system of the ancestral language; for a family with many branches, we might even be able to reiterate this process, but every step along this line introduces more uncertainty.
So, to get back to conlanging: I want there to be signs of these problems in the reconstructed form, I don't just want there to be a set, certain list of roots and a set of sound changes applied algorithmically that churns out descendant forms. I want there to be space for uncertainty.