Sunday, December 21, 2014

Considerations of Language Complexity, part 2

I have previously set out some basic notions for investigating this topic here. The conclusion – which was rather short and not very substantiated – was that grammar is a weighted average of the patterns residing in the minds of the members of a speech community. Speech communities may overlap in various ways, of course, but I did not (and will not) get into that for now.

So, we have patterns in minds, and we basically need to figure out which of these patterns are common enough to be understood by a large segment of the population when used in productive manners. This is a bit difficult! Turns out we cannot take a medical tricorder and read the brain waves for a short second and know what grammar resides in there!

We need to figure out these patterns some other way. And obviously, the methods for doing this consists of painstakingly analyzing large amounts of utterances - but we also need to test our analysis, and see if we've made mistaken identifications. Finally, we might have failed to spot some subtle distinctions - we might not have realized that something we believe is in free distribution really does mark some distinction that we have not really expected - it is easy to imagine a linguist thinking that Finnish object cases are in free distribution except in the negative, because he is unaware of telicity as a phenomenon.

So, we need to figure out when a certain distribution of things is meaningful or when it is essentially random. This is not an easy thing! Not only do we need a relatively large corpus to weed out things that randomly happen to look like patterns, we also need to realize that sometimes people do write or say sentences that they themselves recognize as ungrammatical - maybe the mouth musculature had a slight twitch and a suffix was omitted, maybe a writer had rearranged his sentence and forgotten to fix the case of some argument, etc.

This is somewhat related to the question of how many times you have to flip a coin to decide with some certainty whether it is a fair coin or not, with the further addition that you don't flip the coin yourself - you are told how it's flipped by an arbiter with unknown but relatively high accuracy. (The analogy is like this: we hypothesize the coin is unfairly biased in favour of a number – we hypothesize the grammar is biased in favour of encoding a certain meaning a certain way; we can see how the coin behaves by flipping it – we can see how the grammar behaves by investigating a corpus or testing how native speakers utter things in a given situation. We want to be sure that one side of the coin doesn't turn up exceptionally many times – we want to be sure that something that looks like a recurring pattern is not just random happenstance.

But a thing that is somewhat worse is we should also figure out whether there are patterns we may have missed. This is even worse and I can't even come up with a metaphor for it. Modern linguistics could maybe use corpus linguistics to figure some of it out – there even are data mining procedures that maybe could find things no linguist has spotted, provided the sample he used was large enough – and I bet rather fascinating tendencies may be discovered this way soon, and this will probably facilitate the research into pragmatics quite well.

However, that is not generally a method we can use when researching languages in the rain forests of Brazil or Papua New Guinea. It seems the brain is a pretty efficient pattern identifying algorithm, and most speakers of a language will (subconsciously) spot a lot of patterns and use them – on the other hand, some speakers may fail to notice some patterns, or just fail to incorporate them into their own usage despite parsing them correctly.

Further, lots of languages have not been very carefully researched. English, French, German, etc have been very carefully researched by thousands of scholars, each contributing to an understanding of how these languages' grammar works, geographical variation in those workings, etc. A language spoken by three hundred people in the Amazon basin has not been studied as carefully, thus we just don't have any idea if all meaningful patterns have been observed - we even have a good reason to think there is no exhaustive description.

So, the risk is great that if we look at a reference grammar of language so-and-so, and find it impoverished with regards to its amount of grammar, it is rather a lack of research than a lack of actual grammar that is the problem.

I don't think all languages have the same amount of grammar – but I think the amount is of the same order of magnitude (and even closer than that). Of course, it's difficult to come up with a reasonable measure for how much grammar a language has - comparing Chinese and Finnish, the morphological tables of Finland look impressive, and Chinese cannot offer anything like that. But Chinese has lots of restrictions on what kinds of constructions are permitted, on when to use or not to use classifiers, etc. How to compare these is not obvious in any way.

Further, of course, some modern theories of language have pointed out that grammar and lexicon interact in rather weird ways – essentially, large parts of grammar is stored in the lexicon (see, for instance, Lexical-Functional Grammar/Syntax). That naturally complicates the manner even more, and makes it even more challenging to exhaustively describe a language.

No comments:

Post a Comment