Comments

Wednesday, October 29, 2014

Here we go again (and again, and again)

God there’s a lot of junk out there. And by ‘junk’ I don’t just mean poorly done and pointless, for even shoddy aimless research can enlighten inadvertently. What I mean is stuff that looks serious and is taken seriously by the scientific punditry but is so completely ignorant and off base that it makes you dumber if you read it. I say this to warn you not to read this review of a book by one Vyvyan Evans called The Language Myth. The review is by one Alun Anderson (what’s with this weird spelling? Vyviyan? Alun? Though maybe someone called ‘Norbert’ is in no position to asperse). As I have not read the book, I will just comment on the review. For Mr. Evan’s sake, I hope that Mr. Anderson has failed to understand what Mr. Evans actually wrote. If he got Evans right, then the book is likely a waste of time. But, as I said, I have not read it (and I am not sure that I will) so I will concentrate on the review. 

The review has that breathless “excited” quality that all tracts heralding an impending conceptual “revolution” trade in: [1]

After reading The Language Myth, it certainly looks as if a major shift is in progress, one that will open people’s minds to liberating new ways of thinking about language.

What’s the novelty? Well, you can guess: the Chomsky-Fodor-Pinker (Andersen’s triad, presumably following Evans) view that language is an “instinct” is just plain wrong, a “myth” that has swept the popular imagination (were it so!).[2] The myth, it is claimed, is based on the “way that children effortlessly learn language just by listening to the adults around them, without being aware explicitly of the governing grammatical rules.” This observation led Chomsky (and Pinker (his publicist (boy I’m sure that will go down great with Steve!)) and Fodor, another dupe) to argue that there is “a module in the mind… waiting to be activated…when an infant encounters language. The rules behind language being built into the genes.” This is not any particular grammar, but a “universal grammar, capable of generating the rules of any of the 7000 or so languages that a child might be exposed to, however different they might appear.”[3]

That’s the myth. As I assume that Andersen does not challenge the above description concerning the ease with which kids acquire a natural language (though I cannot be sure, maybe he does), I assume that the “myth” that needs exposure concerns the existence of an innate capacity, highly developed (indeed likely unique) to humans, that underlies their linguistic proclivities.

Now read a certain way, this is not necessarily a bad précis of the modern Chomsky version of GG.  Human children bring to the task of acquiring and deploying a language special species-specific skills that enable them to do what other clever animals cannot, i.e. acquire and use a natural language. There are arguments to the effect that some (or much) of this capacity is linguistic specific (though how much is currently a topic of debate), but given the obvious difference between human capacities wrt language and that of anything else we know of, the supposition that there is something special about us when it comes to language is hardly one of those going-out-on-a-long-thin-limb sort of assumptions. Indeed, I would go further, everyone assumes something analogous for the obvious fact remains that nothing does language like humans do, and on the pretty standard assumption, that what we are mentally capable of has something to do with our mental capacities, the fact that we can do language easily and nothing else does it at all implies that we have some mental capacities that other animals (and plants and rocks) do not. This is not an exciting view. And to say that humans have a language acquisition device is, at least minimally, to observe that this fact is obvious and needs explaining.

There are, of course, more interesting conclusions that one can draw and that have been drawn: e.g. that the mental capacities are in part sui generis, both wrt humans and to language, that the special capacity we have is qualitative not merely quantitative, that this capacity is dissociable from other cognitive capacities, etc.  Each of these further claims comes with substantial discussion and evidence, none of which Mr. Anderson seems aware of.  Or at least he doesn’t mention or address it. Why, because he thinks that Mr Evans has provided simple straightforward evidence demonstrating how nutty this nativist viewpoint is. What’s that evidence?

A key criticism is that the more languages are studied, the more their diversity becomes apparent and an underlying universal grammar less probable.

Spot the flaw (duh!). Once again the absence of Greenberg universals is used to dismiss Chomsky universals.[4] We’ve seen this before (right Mr. Everett?). And we will see it again (and, if past is prologue, again and again and again, sadly). But, to repeat (and repeat and repeat), these are very (very very) different things. A rich innate language specific mental module is consistent with a great deal of variation in the surface properties of individual languages. UG does not imply the existence of universal manifest patterns in every language. It does not even imply that all Gs must contain (some of) the same rules (e.g. there is no requirement that every language have focus movement or aux inversion or any other rule). Chomsky universals are about types of Gs (the kinds of rules/principles that they have) and to overthrow it requires showing more than some Gs have rules that others don’t or some surface patterns that others don’t.

Truth be told, however, the distinction between the two conceptions is probably not one that would concern Mr Andersen. Why not? Because he seems to know nothing at all about any work in the tradition that he sees as ready to collapse. He seems to believe that the mere existence of “free word order languages,” languages that “build sentences out of prefixes and suffixes to create giant words,” or languages that “appear not to have nouns and verbs” would be news to linguists very snugly in the Chomsky GG tradition (as if languages like Walpiri, Mohawk, Navaho and Salish were never studied and analyzed within the Chomsky GG tradition).  I’m pretty sure that the existence of the work of Ken Hale, Mark Baker, Lisa Mathewson, Julie Legate, Henry Davis, Masha Polinsky, Ben Bruening (among many many others) and the many GG inspired analyses of the Gs of these languages they have offered would be news to Mr. Andersen, something that the editors of New Scientist might have thought of before asking him to review Mr. Evans book.

But this is not all. Mr. Andersen also appears to think that genes require invariant expression, so that change and variation are inconsistent with information being genetically coded. In additiona, it seems that he believes that language change is incompatible with the claim that “grammar is laid out in the genes.” For Mr. Andersen the simple existence of language change and creolization are sufficient reason for denying that humans have a special biologically rooted affinity for language. If only we in the Chomsky tradition had realized that languages differed and changed we would never have gone down the ill-conceived nativist path we’ve taken. We would not have been seduced by the Chomsky-Fodor-Pinker myths. As I write, scales are falling from my eyes! Revolution indeed! 

And last but not least: nativists cannot say how languages carry meaning. Here Mr. Andersen is finally onto something. The problem is that he appears not to realize that nobody understands how this is done. Where meaning comes from, how symbols come to have significance is a real pisser of a problem, but, as they say, it is, at least for now, everyone’s problem. What we can say is that embedding the problem into a larger Chomsky like set of considerations has allowed some progress on some features of it (e.g. antecedence and scope have been illuminated, though what ‘dog’ and ‘know’ and ‘give’ and ‘London’ and ‘house’ mean is still pretty murky (a sign of this is that in your favorite semantic theory the meaning of a word like ‘life’ is ‘life¢’)).[5]

Mr Andersen does give us a hint of what the new age that Mr. Evans is heralding will look like. It will be firmly anchored in “embodied” cognition and empiricism (“arising directly in and from experience”) and mirror neurons (“the same bit of the brain lights up when we see or do hammering”). Yup, that’s the brave new world out there. If you ever thought that Greg Hickok’s evisceration of this mental detritus was overkill, read this junk and the send Greg a thank-you note (I’m sure he will also accept Starbuck’s or Amazon gift cards). Sadly, we will need many Greg like efforts again and again for this kind of junk seems to be both very attractive and impervious to criticism.

Let me make one more point and end. What we see in this review is the resurgence of Empiricism (E). Yes, I know you were expecting me to say this and I didn’t want to disappoint. But it’s true. It lies behind the apparent inability of so many to understand the distinction between Chomsky and Greenberg Universals. Confusing the two lies at the heart of the review (and of the similarly pundit popular work by Everett) and it is easily explained once one appreciates its E roots. What are these?

Es are comfortable with generalizations based on patterns in the data. I’ve discussed this before (here). If you believe that all generalizations are based on patterns manifest in the data then the idea that something can be a universal (e.g. structure dependence) but leaves no imprint in the (positive) linguistic data makes little sense. So, universals like ‘if L is OV then it will be OP,’ or ‘all languages distinguish Ns from Vs’ will leave induction friendly footprints in the linguistic data and are ok for Es (recall, everyone, including Es accept the idea that one needs to generalize). But other generalizations like ‘all languages obey islands,’ or ‘rules must be structure dependent,’ or ‘anaphors cannot be c-commanded by their antecedents,’ may leave little direct evidence in the positive data of a particular language (especially of one restricts data to what is expressed in naturalistic linguistic environments (viz. assumes that negative data is not relevant)). If you are an E, the only legit generalizations (aka Universals) are of the first kind. That’s why an E will find the Chomsky conception of universals close to incomprehensible, and, not surprisingly, will tend to confuse Chomsky and Greenberg universals (more accurately, will not be able to distinguish them), as in fact happens again and again and again. So if you are an E, stop it! It’s mentally stifling, and it leads to the kind of junk thinking this review embodies.

I will end here. To repeat, I have no idea if Mr. Andersen has correctly conveyed the content of Mr. Evan’s book (and I really do hope that he got it all wrong and that Mr. Evans is about to sue for liable as one can do in the UK with its unbelievably lax freedom of speech provisions).  I do know, however, that Mr. Andersen doesn’t know anything about Chomsky or Pinker or Fodor or GG or any work that has been done in the last 65 years.  Asking him to review a book on linguistics is like asking Sampson (my deceased ex-pet Porty) to review a book on animal cognition (actually, Sampson would have made fewer obviously clueless remarks). 

What’s sad is that this kind of junk finds its way into things like New Scientist. This is a venue that the scientifically interested look at to find out what’s happening in other scientific domains. I have in fact done so myself in the past. But, from now on I will be much more wary, for if this is what I find when an area I know something about is discussed, it leads me to think that New Scientist is quite untrustworthy. And that’s too bad. I really respect good popularization. It’s really hard to do it well (thx Mr. Pinker for the Language Instinct) and it is important. Sadly, it can also be done very badly. If you are looking for an excellent case study in how bad very bad can be, I know of no better example than Mr. Andersen’s review of Mr. Evan’s book.



[1] The excitement seems to be spreading. CUP sent me this message from David Crystal "...Evans builds a compelling case that will be difficult to refute." Sounds like Andersen, which if correct does not bode well for Evans, though to repeat I have not yet read the book.
[2] The idea that language is an instinct is most clearly Pinker’s conceit. I don’t actually ever recall Chomsky or Fodor using this terms in describing FL/UG or LoT. And I think I know why. It has misleading connotations. Here’s the lexical entry for the word via Google:
An innate typically fixed pattern of behavior in animals in response toe certain stimuli
This definition does not really cover what GGers of the Chomsky stripe or LoTers of the Fodor variety have meant. First, no “fixed pattern of behavior” follows from their discussions. Remember the competence/performance distinction. Well the innate structures of FoL primarily relate to competence, not performance (i.e. what you know not what you do). Second, I’m not sure how to understand “fixed.” FL/UG constrains the class of possible Gs (some rules ok, others not), but it does not require that any given G have any particular shape (there is no requirement that rule X be included in every G). 
            What’s right about ‘instinct’ is that it is not learned, need not be conscious and is triggered by input. However, the shades of meaning the word carries can be very misleading and though I can see good advertising reasons for why Pinker used the word in his title, I can also see reasons for avoiding it.
[3] This 7k number keeps coming up. But there is no reason to think that there are 7k languages, at least if one counts these via the Gs that generate them. As Richie Kayne once said, and I completely agree, there is either one language or at least 7.125 billion (as of 2013), one for each person.  Moreover, there is no reason to believe that the set of possible languages, again if G individuated is not several orders greater. 
[4] I should have said “possible” absence of Greenberg universals. As you out there know, there appear to be some interesting Greenberg universals worth explaining, and many Chomsky inspired linguists like Cinque, Roberts, Kayne, Baker (and many others) are in the business of trying to account for them. I would be surprised if Mr Andersen knows anything about this. The blooming buzzing confusion that is the 7000 languages is enough for him. I wonder if the diversity of life or various kinds of stuff in the world would lead Andersen to conclude that there is no universal genetic code or that the periodic table is ripe for dissolution. Consistency would suggest that he would, but consistency is only the hobgoblin of petty minds, no doubt leaving Mr Andersen an exit strategy.
[5] John burgess, who is today visiting UMD and giving a talk and whose wonderful papers I’ve just started reading puts it succinctly: “There is nothing that could be called a body of accepted scientific conclusions about meaning… that workers… can draw upon and apply to their concerns” (In his “Quine, analyticity, and philosophy of mathematics”).

Monday, October 27, 2014

This apparently happened at a real university

Take a look at this. A Prof is suspended from university duties for sarcasm and bad body language. Of course, this is not the real reason. That's provided in this paragraph:

Professor Docherty is a prominent critic of the marketization of education who has described the Russell Group - of which the University of Warwick is a member – as "a self-declared elite…even exerting a negative influence over others".

So, a critic of the university sighs and "undermines the authority of the chair." For this he is suspended for nine months. A tribunal rules on the "charges" and is cleared. This suggests that they concluded that he did not sigh, use negative body language or use undue sarcasm (this sounds like a Monty Python charge: what, undue sarcasm! Off with his head!!). As a college union rep put it:

 "It beggars belief that an academic can be suspended with no contact with students or colleagues for almost a year while charges are finalised.”

Read that carefully. What the union found unconscionable was that the prof was suspended without charges being finalized. This suggests that so far as the union was concerned the charges themselves were fine. It now seems that undue sarcasm and negative vibes is unacademic. If there is a French dept job, Voltaire need not apply, I suppose.

Part of what makes this funny, of course, is that we all think that this is a one off thing. And most likely it is. But I wish I more strongly believed that this was so.  Our fearless leaders don't like being made fun off (i.e. undermined). They don't really like to be laughed at.  If this is true of politicians and leaders of industry, why not the bureaucrats that are heading up more and more institutions of higher learning. This would actually be funny, again Monty Python wise, were it not so pathetic.




Lila on "quality of input"

Lila Gleitman sent me this note that I am posting on her behalf. It relates to an earlier post that reacted to a piece on the NYT on voacb acquisition in kids. The relevant links are provided in Lila's comment. So heeeere's Lila!

******

Here is more about the “quality of input” and vocabulary acquisition matter (see here).  Our group has studied this topic (See our paper published in PNAS, 2013, not connected to the recent Hirsh-Pasek study; the link for this is here).   So I want to correct the mangled NYTimes article that has generated so much interest and perhaps will even influence educational policy in future. First, some facts:  Hirsh-Pasek, whose work/speech at the White House this newspaper article reports, did not specifically look at possible differences in “quality” (of which, more below) that might vary as a function of class, wealth, etc.   All her subject learners were of lower class SES, and so could shed no light on whether or the extent to which “quality input” is unequally distributed across SES classes, because she had no comparison group (i.e., learners from other than low SES strata). Yet the implication was left hanging in the NYT air, just by mentioning that these children were all lower class, that wealthier people provide classier “communicative foundations” to their offspring.  Our own study does in fact make this SES comparison directly, and the bottom line is that there is no measureable difference in “quality of input,” if we can define such a thing at all, as a function of social class.   So either ignore what you read in the NYT, or go read our article and see what you think in the light of the evidence we presented.   There is an SES-linked difference in the quantity of speech (sheer number of words) infants hear before the 5th birthday, however.  

But now back to what “quality of input” could be, in any sense relevant to facilitative environments for language learning.  The Hirsh-Pasek study isn’t published as yet, as far as I know, but her “White Paper” summary suggests that she “coded” maternal speech and behavior for the extent to which it is “symbol infused,” and related categories that are, perhaps themselves hard to understand or apply generally.    Despite the real difficulties of such hand coding over highly variable naturalistic interactions, there are some  facts about nonlinguistic environmental variance in relevant regards that are strong enough to shine through.  Specifically, as our studies showed, there is a very powerful influence of referential transparency (that’s what “quality” largely comes down to, when you peel away abstract labels like “foundations of communication” that appear in this literature).   Referential transparency is simply your good-old commonsensical notion:  there are times, during conversational interaction, when a listener is attentionally focused on a particular thing, action, etc., and the speaker via gesture, manipulation of the object, gazing/pointing at it, also mentions it.  A blatant case (pace Quine) is saying “This is a squirrel” while pointing to and gazing at a squirrel in the presence of child’s close attention to a squirrel.   Turns out that there are stable, measurable, and strikingly large familial differences in the proportion of time that such informative extralinguistic contexts are provided to infants (as I said, their presence/frequency unrelated to the wealth or class of the family), and these differences (already observed in our sample populations at child age 14 months) predict vocabulary size when measured 3 years later as these children enter kindergarten.  Colin in an earlier blog ably described why this very large early vocabulary-size difference matters, over the longer run, in the Real World of school and future job, so I won’t belabor that point.   But a few words now on the lexicon and whether it has any linguistic interest.


Of course it does. As has been evident since the seminal work of Carol Chomsky on ask/tell/promise/easy/eager, the business of acquiring the language-specific grammar is inextricably (I hate that word, but it’s right here) tied up with acquiring the meanings of terms whose interpretation is not so transparent to referential observation as is, say, “cat.”   For instance, imagine a blind child acquiring “look” and “see.”   Or anyone trying to acquire “think.”  This can’t be done if the input is solely referentially consistent cases, i.e., everyday scenarios in which thinkers are thinking, but requires in addition (or even instead) access to predicate-specific licensed syntactic structures.  The issues here are not only relevant but pretty central to linguistic theorizing, in my opinion (see the huge literature on “syntactic bootstrapping”).  But how about “cat”?   After all, learning must begin with such homely cases, for which referential information provides the bulk of the basis for identification.  They begin to be understood as early as the 6th month of life.  These words provide the scaffolding for at least rudimentary acquisition of the grammar (mainly:  where, structurally speaking, is the sentential subject in the exposure language) so their acquisition function is of some preliminary interest for linguistic-developmental theorizing.  This first lexical stock forms crucial grounding information, the first step to enable all the later fancy footwork, i.e., the gateway to the linguistic-computational achievements that can then – and therefore – proceed.  I hastily remind that how the concepts themselves (not the words for them) are acquired is an abiding mystery, but one that necessarily is engaged outside Linguistics, notably by people who study perceptual development, theory of mind, and the like (see, e.g., important lines of research from from Spelke; Csibra,  and many others).   

Saturday, October 25, 2014

The two PoSs again

In a previous post (here) I discussed two possible PoS arguments. I am going to write about this again, mainly to clarify my own thinking. Maybe others will find it useful. Here goes. Oh yes, as this post got away from me lengthwise, I have decided to break it onto two parts. Here’s the first.

The first PoS argument (PoS1) aims to explain why some Gs are never attested and the other (PoS2) aims to examine how Gs are acquired despite the degraded and noisy data that the LAD exploits in getting to its G. PoS1 is based on what we might call the “Non-Existing Data Problem” (NEDP), PoS2 on the “Crappy Data Problem” (CDP). What I now believe and did not believe before (or at least not articulately) is that these are two different problems each raising their own PoS concerns. In other words, I have come to believe (or at least think I have), that I was wrong, or had been thinking too crudely before (this is a slow fat ball down the middle of the plate for the unkind; take a hard whack!). On the remote possibility that my mistakes were not entirely idiosyncratic, I’d like to ruminate on this theme a little and in service of this let me wax autobiographical for a moment.

Long long ago in a galaxy far far away, I co-wrote (with David Lightfoot) a piece outlining the logic of the PoS argument (here, see Introduction). In that piece we described the PoS problem as resting on three salient facts (9):

(1)  The speech the child hears is not “completely grammatical” but is filled with various kinds of debris, including slips of the tongue, pauses, incomplete thoughts etc.
(2)  The inference is from a finite number of G products (uttered expressions) to the G operations that generated these products. In other words, the problem is an induction problem where Gs (sets of rules) are projected from a finite number of examples that are the products of these rules.
(3)  The LAD attains knowledge of structures in their language for which there is no evidence in the PLD.

We summarized the PoS problem as follows:

… we see a rich system of knowledge emerging despite a poverty of the linguistic stimulus and despite being underdetermined by the data available to the child. (10)

We further went on to argue that of these three data under-determination problems the third is the most important for it logically highlights the need for innate structure in the LAD. Or, more correctly, if there are consistent generalizations native speakers make that are only empirically manifested in complex structures that are unavailable to the LAD then these generalizations must reflect the structure of the LAD rather than that of the PLD. In other words, cases where the NEDP applied can be used as direct probes into the structure of the LAD and, as there are many cases where the PLD is mute concerning the properties of complex constructions (again, think ECP effects, CED effects, Island Effects, Binding effects etc), these provide excellent (indeed optimal) windows into the structure of FL (i.e. that component of the LAD concerned with acquiring Gs).

I still take this argument form to be impeccable.  However, the chapter went on to say (this is likely my co-authors fault, of course! (yes, this is tongue in cheek!!!)) the following:

If such a priori knowledge must be attributed to the organism in order to circumvent [(3)], it will also provide a way to circumvent [(1)] and [(2)]. Linguists need not concern themselves with the real extent of deficiencies [(1)] and [(2)]; the degenerateness and finiteness of the data are not real problems for the child because of the fact that he is not totally dependent on his linguistic experience, and he knows certain things a priori; in many areas, exposure to a very limited range of data will enable a child to attain the correct grammar, which in turn will enable him to utter and understand a complex range of sentence types. (12-13).

And this is what I no longer believe. More specifically, I had thought that solving the PoS based solely on the NEDP would also suffice to solve the acquisition problem that the LAD faces due to the CDP.  I very much doubt that this is true.  Again, let me say why. As background, let’s consider again the idealizations that bring PoS1 into the clearest focus.

The standard PoS makes the following very idealized assumptions:

(4)  a.   The LAD is an ideal speaker-hearer.  
b.     The PLD is perfect PLD: from a single G, presented “all at once,”
c.     The PLD is “simple.” simple clauses more or less

What’s this mean? (4a) abstracts away from reception problems.  The LAD does not “mishear” the input, its attention never wavers, its articulations are always pristine, etc.  In other words, the LAD can extract whatever information the PLD contains.  (4b) assumes that the PLD on offer to the LAD is flawless. Recall that the LAD is exposed to linguistic utterances from which it must look for grammatical structure. The utterances may be better or worse vehicles for these structures. For example, utterances can be muddy (mispronunciation), imperfect (spoonerisms, slips of the tongue), incomplete (hmming and hawing and incomplete thoughts). Moreover, in the typical acquisition environment, the ambient PLD consists of utterances of linguistic expressions (not all of them sentences) generated by a myriad of Gs. In fact, as no two Gs are identical (and even one speaker typically has several registers) it is very unlikely that any single G can cover all of the actual PLD.  (4b) abstracts away from this. It assumes that utterances have no performance blemishes and that all the PLD is the product of a single G.

These assumptions are heroic, but they are also very useful. Why? Because together with (4c) they serve to focus attention on PoS1, which recall is an excellent window (when available) into the native structure of FL. (4c) restricts the PLD to “simple” input. As noted (here) a good proxy for “simple” is un-embedded main clauses (plus a little bit, Degree 0+).[1] In effect, assumptions (4a,b) abstract away from the CDP and (4c) focuses attention on the NEDP and what it implies for the structure of LADs.

As indicated, this is an idealization. Its virtue is that it allows one to cleanly focus on a simple problem with big payoffs if one’s interest is in the structure of FL.[2] The real acquisition situation however is known to be very different. In fact, it’s much more like (5):

(5)  a. Noisy Data
b. Non-homogeneous PLD

Thus, the actual PLD is problematic for the LAD in two important ways in addition to it being deficient in NEDP terms. First, there is lots of noise in input as there is often a large distance between pristine sentences and muddy utterances. On the input side, then, the PLD is hardly uniform (different speakers, registers), contains unclear speech, interjections, slips of the tongue, incomplete and wayward utterances, etc. On the intake side, the actual LAD (aka: baby) can be inattentive, mishear, have limited intake capacity (memory) etc.  Thus, in contrast to the idealized data assumed for PoS1, the actual PLD can be very much less than perfect.

Second, the PLD consists of expressions from different Gs. In the extreme, as no two people have the exact same G, every acquisition situation is “multi-lingual.” In effect, standard acquisition is more similar to cases of creolization (i.e. multiple “languages” being melded into one) than to the ideal assumed in PoS1 investigations.[3] Thus there is unlikely to be a single G that fits all the actual PLD. Moreover, the noisy data is presented incrementally, thus, not all-at-once. Therefore, the increments are not only noisy but with respect to the LADs as a whole, the actual PLD is quite variable. It is very likely that no two actual LADs get the same sequence of input PLD.

These two features it is reasonable to believe can raise their own PoS problems. In fact, Dresher and Fodor/Sakas have shown that relaxing the all-at-once assumption makes parameter setting very challenging if the parameters are not independent (which there is every reason to believe is the case). Dresher, for example, demonstrated that even a relatively simple stress LAD has serious problems incrementally setting its parameters. I can only imagine the problems that might accrue were the PLD not only presented incrementally, but was drawn from different stress Gs 10% of which were misleading.

And that’s the point I tried to take away from the Gigerenzer & Brighton (G&B) paper: it is unlikely that the biases required to get over the PoS1 hurdle will suffice to get actual LADs over PoS2. What G&B suggests is that getting through the noise and the variance of the actual PLD favors a very selective use of the input data. Indeed, given what we suspect, if you can match the data too well you will likely not be tracking a real G given that the PLD is not homogeneous, noise free and closely clustered around a single G. And this is due both to performance considerations (sore throats, blocked noses, “thinkos,” inarticulateness, inattention, etc.) and non-homogeneity (many Gs producing the ambient PLD).  In the PoS2 context things like the Bias-Variance-Dilemma might loom large. In the first they don’t because our idealizations abstract away from the kinds of circumstances that can lead to them.[4]

So, I was wrong to run together PoS1 problems and PoS2 problems. The two kinds of investigations are related, I still believe, but when the PoS1 idealizations are relaxed new PoS problems arise. I will talk about some of this next time.





[1] In modern terms this would be something like the top two phases of a clause (C and v*).
[2] This kind of idealization functions similarly to what we do when we create vacuum chambers within which to drop balls to find out about gravity. In such cases we can physically abstract away from interfering causal factors (e.g. friction). Linguists are not so lucky. Idealization, when it works, serves the same function: to focus on some causal powers to the exclusion of others.
[3] In cases of creolization, if the input is from pidgins then the ambient PLD might not reflect underlying Gs at all, as pidgins may not be G based (though I’m not sure here). At any rate, the idea that actual PLD samples from products of a single G is incorrect. Thus every case of real life acquisition is a problem in which PLD springs from multiple different Gs.
[4] In fact, Dresher and Fodor&Sakas present ways of ignoring some of the data to enforce independence on the parameters thus allowing them to incrementally set parameters.  Ignoring data and having a bias seem (impressionistically, I admit) related.