Friday, June 26, 2015

Aspects at 50

Here is a piece (here) by Geoff Pullum (GP) celebrating the 50th anniversary of Aspects. This is a nice little post. GP has a second post that he mentions in the one linked to on the competence/performance distinction. I’ll put up a companion piece to this second post anon. Here are a few comments on GP’s Aspects post.

Here GP gives a summary that feels right to me (i.e. my recollections match GP’s) about the impact that Aspects had on those that first read it. Reading chapter 1 felt revelatory,  like a whole new world opening up. The links that it forged between broad issues in philosophy (I was an undergrad in philosophy when I first read it) and central questions in cognition and computation were electrifying. Everyone in cognition broadly construed (and I do mean everyone: CSers, psychologists, philosophers) read Aspects and believed that they had to read it. Part of this may have been due to some terminological choices that Chomsky came to regret (or so it I believe). For example, replacing the notion “kernel sentence” with the notion “deep structure” led people to think, as GP put it:

Linguistics isn’t a matter of classifying parts of sentences anymore; it was about discovering something deep, surprising and hidden.

But this was not the reason for its impact. The reason Aspects was a go-to text was that chapter 1 was (and still is) a seminal document of the Cognitive Revolution and the study of mind. It is still the best single place to look if one is interested in how the study of language can reveal surprising, non-trivial features about human minds. So perhaps there is something right about the deep in Deep Structure. Here’s what I mean.

I believe that Chomsky was in a way correct in his choice of nomenclature. Though Deep Structure itself was/is not particularly “deep,” understanding the aim of syntax as that which maps between phrase markers that represent meaning-ish information (roughly thematic information, which, recall, was coded at Deep Structure)[1] with structures that feed phonetic expression is deep. Why? Because, such a mapping is not surface evident and it involves rules and abstract structure with their own distinctive properties.  Aspects clarifies what is more implicit in Syntactic Structures (and LSLT, which was not then widely available); namely that syntax manipulates abstract structures (phrase markers). In particular, in contrast to Harris, who understood Transformations as mapping sentences (actually items in a corpus (viz. utterances)) to sentences, Aspects makes clear this is not the right way to understand transformations or Gs. The latter map phrase markers to other phrase markers and eventually to representations of sound and meaning. They may map relations between sentences, but only very indirectly. And this is a very big difference in the conception of what a G is and what a transformation is, and it all arises in virtue of specifying what a Deep Structure is. In particular, whereas utterances are plausibly observable, the rules that do the mappings that Chomsky envisaged are not. Thus, what Aspects did was pronounce that the first object of linguistic study is not what you see and hear but the rules, the Gs that mediate two “observables”: what a sentence means and how it is pronounced. This was a real big deal, and it remains a big deal (once again, reflect on the difference between Greenberg and Chomsky Universals). As GP said above, Deep Structure moves us from meditating on sentences (actually, utterances or items in corpora) to thinking about G mappings.

Once one thinks of things in this way, then the rest of the GG program follows pretty quickly: What properties do Gs have in common? How are Gs acquired on the basis of the slim evidence available to the child? How are Gs used in linguistic behavior? How did the capacity to form Gs arise in the species? What must G capable brains be like to house Gs and FL/UGs? In other words, once Gs become the focus of investigation, then the rest of the GG program comes quickly into focus. IMO, it is impossible to understand the Generative Program without understanding chapter 1 of Aspects and how it reorients attention to Gs and away from, as GP put, “classifying parts of sentences.”

GP also points out that much of the details that Aspects laid out have been replaced with other ideas and technology. There is more than a little truth to this. Most importantly, in retrospect, Aspects technology has been replaced by technicalia more reminiscent of the the Syntactic Structures (SS)-LSLT era. Most particularly, we (i.e. minimalists) have abandoned Deep Structure as a level. How so?

Deep Structure in Aspects is the locus of G recursion (via PS rules) and the locus of interface with the thematic system. Transformations did not create larger phrase markers, but mapped these Deep Structure PMs into others of roughly equal depth and length.[2] In more contemporary minimalist theories, we have returned to the earlier idea that recursion is not restricted to one level (the base), but is a function of the rules that work both to form phrases (as PS rules did in forming Deep Structure PMS) and transform them (e.g. as movement operations did in Aspects). Indeed, Minimalism has gone onse step further. The contemporary conceit denies that there is a fundamental distinction between G operations that form constituents/units and those that displace expressions from one position in a PM to another (i.e. the distinction between PS rules and Transformations). That’s the big idea behind Chomsky’s modern conception of Merge, and it is importantly different from every earlier conception of G within Generative Grammar. Just as LGB removed constructions as central Gish objects, minimalism removed the PS/Transformation rule distinction as a fundamental grammatical difference. In a merge based theory there is only one recursive rule and both its instances (viz. E and I merge) build bigger and bigger structures.[3]  

Crucially (see note 3), this conception of structure building also effectively eliminates lexical insertion as a distinct G operation, one, incidentally, that absorbed quite a bit of ink in Aspects. However, it appears to me that this latter operation may be making a comeback. To the degree that I understand it, the DM idea that there is late lexical insertion comes close to revitalizing this central Aspects operation. In particular, on the DM conception, it looks like Merge is understood to create grammatical slots into which contents are later inserted. This distinction between an atom and the slot that it fills is foreign to the original Merge idea. However, I may be wrong about this, and if so, please let me know. But if so, it is a partial return to ideas central to the Aspects inventory of G operations.[4]

In addition, in most contemporary theories, there are two other lasting residues of the Aspects conception of Deep Structure. First, Deep Structure in Aspects is the level where thematic information meets the G. This relation is established exclusively by PS rules. This idea is still widely adopted and travels under the guise of the assumption that only E-Merge can discharge thematic information (related to the duality of interpretation assumption). This assumption regarding a “residue” of Deep Structure, is the point of contention between those that debate whether movement into theta positions is possible (e.g. I buy it, Chomsky doesn’t). [5] Thus, in one sense, despite the “elimination” of DS as a central minimalist trope, there remains a significant residue that distinguishes those operations that establish theta structure in the grammar and those that transform these structures to establish the long distance displacement operations that are linguistically ubiquitous.[6]

Second, all agree that theta domains are the smallest (i.e. most deeply embedded) G domains. Thus, an expression discharges its thematic obligations before it does anything else (e.g. case, agreement, criterial checking etc.). This again reflects the Aspects idea that Deep Structures are inputs to the transformational component. This assumption is still with us; despite the “elimination” of Deep Structure. We (and here I mean everyone, regardless of whether you believe that movement to a theta position is licit) still assume that a DP E-merges into a theta position before it I-merges anywhere else, and this has the consequence that the deepest layer of the grammatical onion is the theta domain. So far as I know, this assumption is axiomatic. In fact, why exactly sentences are organized so that the theta domain is embedded in case/agreement domain, which is in turn embedded in A’-domains is entirely unknown.[7]

In short, Deep Structure, or at least some shadowy residue, is still with us, though in a slightly different technical form. We have abandoned the view that all thematic information is discharged before any transformation can apply. But we have retained the idea that for any given “argument” its thematic information is discharged before any transformation applies to it, and most have further retained the assumption that movement into theta positions is illicit. This is pure Deep Structure.

Let me end by echoing GP’s main point. Aspects really is an amazing book, especially chapter 1. I still find it inspirational and every time I read it I find something new. GP is right to wonder why there haven’t been countless celebrations of the book. I would love to say that it’s because its basic insights have been fully absorbed into linguistics, and the wider cognitive study of language. It hasn’t. It’s still, sadly, a revolutionary book. Read it again and see for yourself.

[1] Indeed, given the Katz-Postal hypothesis all semantic information was coded at Deep Structure. As you all know, this idea was revised in the early 70s with both Deep Structure and Surface Structure contributing to interpretation. Thematic information still coded in the first and scope information and binding and other semantic effects coded in the second. This led to a rebranding, with Deep Structure giving way to D-Structure. This more semantically restricted level was part of every subsequent mainstream generative G until the more contemporary Minimalist period. And, as you will see below, it still largely survives in modified form in thoroughly modern minimalist grammars.
[2] “Roughly” because there were pruning rules that made PMs smaller, but none that made them appreciably bigger.
[3] In earlier theories PS rules built structures that lexical insertion and movement operations filled. The critical feature of Merge that makes all its particular applications structure building operations is the elimination of the distinction between an expression and the slot it occupies. Merge does not first form a slot and then fill it. Rather expressions are combined directly without rules that first form positions into which they are inserted.
[4] Interestingly, this makes “traces” as understood within GB undefinable, and this makes both the notion of trace and that of PRO unavailable in a Merge based theory. As the rabbis of yore were found of mumbling: Those who understand will understand.
[5] Why only E-merge can do so despite the unification of E and I merge is one of the wedges people like me use to conceptually motivate the movement theory of control/anaphora. Put another way, it is only via (sometimes roundabout) stipulation that a minimalist G sans Deep Structure can restrict thematic discharge to E-merge.
[6] In other words, contrary to much theoretical advertising, DS has not been entirelyeliminated in most theories, though one central feature ahs been dispensed with.
[7] So far as I know, the why question is rarely asked. See here for some discussion and a link to a paper that poses and addresses the issue.

Monday, June 22, 2015

I admit it

I admit it: until very recently when I heard “morphology” I reached for my pillow.  I knew that I was supposed to find it all very interesting, but like cod liver oil, knowing this did not make ingesting it any more pleasant. Moreover, I even developed a spiel that led to the conclusion that I did not have to be interested in it for it was at right angles to those questions that gripped me (and you all know what these are, viz. PP and DP and Empiricism/Rationalism, and FL and UG etc.). Morphology was not as obviously relevant to these questions because much of it dealt with finite (often small) exception-full paradigms. So many rules, so many exceptions, so many data points. Is there really a PoS problem here? Not obviously. So, yes morphology exists and is abundant, but does language really need morphology or is it just an excrescence? At any rate, despite some questions at a very general level (here and here), I was able to divert my gaze and convince myself that I had reason to do so. Then along came Omer thrusting Bobaljik into my hands and I am here to admit that I was completely wrong and that this stuff is great. You may know all about it, but partly as penance, let me say how and why I was wrong and why this stuff is very interesting even for someone with my interests.

Those of you that are better read than I am already know about Jonathan Bobaljik’s (JB) work on the morphology of superlatives. He has a book (here) and a bunch of papers (e.g. here).[1] I want to talk a little about one discovery that he has made that provides a novel (as JB himself notes) take on the classic PoS argument. It should be part of anyone’s bag of PoS examples that gets trotted out when you want to impress family, friends, students, and/or colleagues. It’s simple and very compelling. I have road tested it, and it works, even with those that know nothing about linguistics.

The argument is very simple. The fact concerns morphological patterns one finds when one examines morphological exceptions. In other words, it rests on the discovery that exceptions can be regular. A bunch of languages (though by no means all) can form comparatives and superlatives from base adjectival forms with affixes. English is a good example of one such language. It provides trios such as big, bigg-er, bigg-est and tall, tall-er, tall-est. Note that in these two examples, big and tall are part of the comparative er form and the superlative ­est form. This is the standard pattern in languages that do this kind of thing. Interestingly, there are exceptions. So in English we also find trios like good bett-er, be-st and bad, worse, wor-st where the comparative and superlative forms are not based on the same base as the simple adjectival form. In other words, the comparative and superlative are suppletive. There’s lots of technical ways of describing this, but for my purposes, this suffices.  Here’s what JB established (of course, based on the work of others that JB copiously cites): that if the comparative is suppletive, then so is the superlative. More graphically, if we take the trio of forms as Adj/Comp/Super, we find AAA patterns, ABB patterns and even ABC patterns but we find no ABA patterns and very very few (maybe no?) AAB patterns.[2] JB’s question is why not? And a very good questions this is.

JB argues that this follows from how superlatives are constructed and how suppletion reflects the Elsewhere Principle. The interested reader should read JB, but the basic proposal is that superlatives have comparatives as structural subparts.[3] How one pronounces the subpart then has an effect on how one can pronounce the larger structure. So, in effect, if the comparative is suppletive and it is part of the structure of the superlative, then the superlative must be suppletive as well given something like the Elsewhere Principle. This accounts for the absence of the ABA pattern. Explaining the absence of the AAB pattern takes a few more assumptions concerning the locality of morphological operations.[4]

All of this may or may not be correct (I am no expert) but it is all very interesting and very plausible. Let’s return to the PoS part of the argument.  JB notes several very interesting properties of these patterns.

First, the pattern (especially the *ABA) is linguistically very robust. It occurs in languages where the morphology makes it clear that comparatives are part of superlatives (Czech) and those where this is not at all evident on the surface (English). Thus, whatever is responsible for the *ABA pattern cannot be something that is surface detectable from inspecting morphologically overt morphemic forms. Thus, that it holds in languages like English does not follow from the fact that the comparative-within-superlative structure is evident in English forms. It isn’t. So that *ABA holds quite generally, even when there is no surface evidence suggesting that the superlative contains a comparative subpart, suggests that the postulated nested relationship between comparatives and superlatives drives the overt morphology, rather than the other way around.  And this, JB notes, strongly suggests that this gap in the possible patterns points to *ABA implicating some fundamental feature of FL/UG.

Note, incidentally, this is an excellent example where the G of language A can ground conclusions concerning the G of language B, something that only makes sense in the context of a commitment to some form of Universal Grammar. The G of Czech is telling us something about the G of English, which borders on the absurd unless one thinks that human Gs as such can have common properties (viz. a commitment to UG in some form).[5]

Second, suppletion of the relevant sort is pretty uncommon within any single language. So, in English there are only two such suppletive trios (for good and bad). So too in other languages (e.g. Hungarian, Estonian, Persian, Sanskrit, Modern Greek and Portuguese have one suppletive form each).[6] Consequently, the *ABA pattern only emerges if one takes a look across a large variety of languages and notes that the rather suppletive *ABA pattern never appears in any.

Let me stress these two points: (i) the pattern is surface invisible in many languages (e.g. English) in that the words that are relevant to finding it do not wear the pattern on their morphological sleeves. (ii) Moreover, the absent pattern occurs in an exceptional part of the language, suppletions being exceptions to the more clearly rule governed part of the morphology. And (iii) suppletions are rare both within a language and across languages. The numbers we are looking at are roughly 200 forms over 50 or so languages.  Nonetheless, when all of these exceptions from across all of these languages is examined, the absence of ABA patterns shines through clearly. So, there is a kind of three fold absence of relevant data for the child: the pattern is surface invisible in many languages, suppletion triplets are rare in any given language and the pattern is only really well grounded when one considers these small number of exceptional cases across a good number of languages. The PoSity (haha) of relevant evidence is evident. As JB rightly concludes, this begs for a FL/UG explanation.

Third, as JB notes, the absence really is the absence of a pattern. Here’s what I mean. Even among languages in close contact the pattern cannot be accounted for by pointing to roots and morphemes that express these patterns shared across the languages. The reason is that the relevant roots and morphemes in language A are not those that express it in language B even in cases where A and B are geographical and historical neighbors. So there is no plausible account of *ABA in terms of borrowing forms across Gs either geographically or historically local to one another.[7]

As JB eloquently sums things up (here: p 16):

…a growing body of research…finds…order in chaos – robust patterns of regularity that emerge as significant in their cross-linguistic aspect. Systematic gaps in these attested patterns…point to the existence of grammatical principles that abstract away from the peculiarities of individual words in specific languages and restrict the class of possible grammars.

This argument combines classic PoS themes with one important innovation. What’s classic is zeroing in on the absence of some grammatical possibility. UG is invoked to explain the gaps, exceptions the observed general patterns. UG is not generally invoked to explain what is visible, but what fails to bark. What is decidedly new, at least to me, is that the relevant pattern is only really detectable across Gs. This is the data the linguist needs to establish *ABA. There is no plausible world in which this kind of data is available in any child’s PLD. Indeed, given the rarity of thee forms overall, it is hard to see how this pattern could be detected by linguists in the absence of extensive comparative work. 

As noted, JB provides a theory of this gap in terms of the structure of superlatives as containing comparatives plus the Elsewhere Principle. If UG requires that superlatives be built from the comparative and UG adopts the Elsewhere Principle then the *ABA hole follows. These assumptions suffice to provide an explanatorily adequate theory of this phenomenon. However, JB decides to venture into territory somewhat beyond explanatory adequacy and asks why this should be true. More particularly, why must superlatives be built on top of comparatives? He speculates, in effect, that this reflects some more general property; specifically a principle

… limiting the (semantic) complexity of functional morphemes. Perhaps the reason there can be no true superlative (abstract) morpheme combining directly with adjectives without the mediation of a comparative element is that the superlative meaning “more X than all others” contain two semantically rich elements: the comparative operator and a universal quantifier. Perhaps UG imposes a condition that such semantically rich elements must start out as syntactic atoms (i.e. X0 nodes). (here p. 6)

It is tempting to speculate that this proposal is related to the kind of work that Hunter, Pietroski, Halberda and Lidz have done on the semantic structure of most (discussed here). They note that natural language meanings like to use some predicates but not others in expressing quantificational meanings. Their analysis of most critically involves a comparative and a universal component and shows how this fits with the kinds of predicates that the analog number + visual system prefer. One can think of these, perhaps, as the natural semantic predicates and if so this might also relate to what JB is pointing to. At any rate, we are in deep waters here, just the kind of ideas that a respect for minimalist questions would lead one to explore.

Let me end with one more very interesting point that JB makes. He notes that the style of explanation developed for the *ABA gap has application elsewhere. When one finds these kind of gaps, these kinds of part-whole accounts are very attractive. There seem to be other morphological gaps of interest to which this general kind of account can be applied and they have very interesting implications (e.g. case values may be highly structured). At any rate, as JB notes, with this explanatory schema in hand, reverse engineering projects of various kinds suggest themselves and become interesting to pursue.

So let me end. To repeat, boy was I wrong. Morphology provides some gorgeous examples of PoS reasoning, which, moreover, are easy to understand and explain to neophytes. I would suggest adding the *ABA MLG to your budget of handy PoS illustrations. It’s great stuff.

[1] I also have a copy of an early draft of what became part of the book that I will refer to here. I will call this JB-draft.
[2] Is Latin the only case of an ABC pattern? JB probably said but I can’t recall the answer.
[3] This is a classical kind of GG explanation: why do A and B share a property? Because A is a subpart of B. OR why if B then A? Because A is A is part of B. Note that this only really makes sense as an explanatory strategy if one is willing to countenance abstract form that is not always visible on the surface. See below.
[4] I won’t discuss this case here, but it clearly bears on question 32 here. Should we find the same locality conditions conditioning morphological and syntactic operations, this would be an interesting (indeed very interesting) reason for treating them the same.
[5] David Pesetsky made just this point forcefully in Athens.
[6] See JB-draft:15.
[7] JB-draft (15-16) notes that neither roots nor affixes that induce the suppletion are preserved across neighboring or historically related languages.

Thursday, June 18, 2015

ROOTS in New York, June 29-July3: What Do I Want To Learn?

ROOTS in New York, June 29-July3: What Do I Want To Learn?
(This Post Also Appears on my own blog with the title Anticipation:Roots)

The recent meeting of syntacticians in Athens has whet my appetite for big gatherings with lots of extremely intelligent linguists thinking about the same topic, because it was so much fun.  

At the same time, it has also raised the bar for what I think we should hope to accomplish with such big workshops. I have become more focused and critical about what the field should be doing within its ranks as well as with respect to communication with the external sphere(s).

The workshop I am about to attend on Roots (the fourth such) to be held in New York from June 29th to July 3rd, offers a glittering array of participants (see the preliminary program here ), organized by Alec Marantz and the team at NYU.   

Not all the participants share a Distributed Morphology (DM)-like view of `roots’,  but all are broadly engaged in the same kinds of research questions and share a generative approach to language. The programme also includes a public forum panel discussion to present and discuss ideas that should be more accessible to the interested general public. So Roots will be an experiment in having the internal conversation as well as the external conversation. 

One of the things I tend to like to do is fret about the worst case scenario.  This way I cannot be disappointed.  What do I think is at stake here, and what is there to fret over in advance you ask?  Morphosyntax is in great shape, right?

Are we going to communicate about the real questions, or will everyone talk about their own way of looking at things and simply talk past one another?
Or will we bicker about small implementational issues such as should roots be acategorial or not? Should there be a rich generative lexicon or not?  Are these in fact, as I suspect,  matters of implementation,   or are they substantive matters that make actual different predictions?  I need a mathematical linguist to help me out here.  But my impression is that you can take any phenomenon that one linguist flaunts as evidence that their framework is best, and with a little motivation, creativity and tweaking here and there, that you can give an analysis in the other framework´s terms as well.   Because in the end these analyses are still at the level of higher level descriptions, and it may look a little different but you can still always describe the facts.  

DM in particular equips itself with an impressive arsenal of tricks and  magicks to get the job done. We have syntactic operations of course, because DM prides itself on being `syntax all the way down´ .  But in fact, but we also have a host of purely morphological operations to get things in shape for spellout (fission, fusion, impoverishment, lowering what have you), which are not normal actions of syntax and sit purely in the morphological component.  Insertion comes next, which is regulated by competition and the elsewhere principle, where the effects of local selectional frames can be felt (contextual allomorphy and subcategorization frames for functional context).   After spellout, notice that you still get a chance to fix some stuff that hasn´t come out right so far, namely by using `phonological´readjustment rules, which don´t exist anywhere else in the language´s natural phonology.  And this is all before the actual phonology begins. So sandwiched in between independently understood syntactic processes and independently understood phonological processes, there´s a whole host of operations whose shape and inherent nature look quite unique. And there´s lots of them. So by my reckoning,  DM has a separate morphological generative component which is different from the syntactic one. With lots of tools in it.

But I don´t really want to go down that road, because one woman´s Ugly is another woman´s Perfectly Reasonable, and I´m not going to win that battle. I suspect that these frameworks are inter translatable and that we do not have, even in principle, the evidence from within purely syntactic theorising, to choose between them.

However, there might be deep differences when it comes to deciding what operations are within the narrow computation and which ones are properties of the transducer that maps between the computation and the other modules of mind brain.  So it´s the substantive question of what that division of labour is, rather than the actual toolbox that I would like to make progress on.

To be concrete, here are some mid-level questions that could come up at the ROOTs meeting.

Mid-Level Questions.
A. Should generative aspects of meaning be represented in the syntax or the lexicon? (DM says syntax)
B.  What syntactic information is borne by roots? (DM says none)
C. Should there be late insertion or  should lexical items drive projection? (DM says late insertion)

Going down a level, if one accepts a general DM architecture, one needs to ask a whole host of important lower level questions to achieve a proper degree of explicitness:

Low-Level Questions
DM1: What features can syntactic structures bear as the triggers for insertion?
DM2: What is the relationship between functional items and features? If it is not one-to-one, can we put constraints on the number of `flavours` these functional heads can come in?
DM3: What morphological processes manipulate structure prior to insertion, and can any features be added at this stage?
DM4: How is competition regulated?
DM5: What phonological readjustment rules can apply after insertion?

There is some hope that there will be a discussion of the issues represented by A, B and C above. But the meeting may end up concentrating on DM1-5.

Now, my hunch is that in the end,  even A vs. B vs. C are all NON-ISSUES. Therefore, we should not waste time and rhetoric trying to convince each other to switch `sides’.  Having said that, there is good evidence that we want to be able to walk around a problem and see it from different framework-ian perspectives, so we don’t want homogeneity either. And we do not want an enforced shared vocabulary and set of assumptions.  This is because a particular way of framing a general space of linguistic inquiry lends itself to noticing different issues or problems, and to seeing different kinds of solutions.   I will argue in my own contribution to this workshop on Day 1, that the analyses that adopt as axiomatic the principle  of acategorial roots prejudges and obscures certain real and important issues that are urgent for us to solve.  So I think A, B and C need an airing.

If we end up wallowing in DM1-5 the whole time, I am going to go to sleep.  And this is not because I don’t appreciate explicitness and algorithmic discipline (as Gereon Mueller was imploring us to get more serious about at the Athens meeting), because I do. I think it is vital to work through the system, especially to to detect when one has smuggled in unarticulated assumptions, and make sure the analysis actually delivers and generates the output it claims to generate.   The problem is that I have different answers to B than in the DM framework, so when it comes to the nitty-gritty of DM2,3 and 5 in particular, I often find it frustratingly hard to convert the questions into ones that transcend the implementation.  But ok, it’s not all about me.

But here is some stuff that I would actually like to figure out, where I think the question transcends frameworks, although it requires a generative perspective. 

A Higher Level Question I Care About
Question Z.  If there is a narrow syntactic computation that manipulates syntactic primes and  has a regular relationship to the generation of meaning, what aspects of meaning are strictly a matter of syntactic form, and what aspects of meaning are filled in by more general cognitive processes and representations? 

Another way of asking this question is in terms of minimalist theorizing. FLN must generate complex syntactic  representations and semantic skeletons that underwrite the productivity of meaning construction in human language. What parts of what we traditionally consider the `meaning of a verb’  are contributed by (i) The narrow syntactic computation itself, (ii) the transducer from FLN to the domain of concepts (iii) conceptual flesh and fluff on the other side of the interface that the verb is conventionally associated with. 

Certain aspects of the computational system for a particular language must surely be universal, but perhaps only rather abstract properties of it such as hierarchical structuring and the relationship between embedding and semantic composition. It remains an open question whether the labels of the syntactic primes are universal or language specific, or a combination of the two (as in Wiltschko’s recent proposals). This makes the question concerning the division of labour between the skeleton and the flesh of verbal meaning also a question about the locus of variation. But it also makes the question potentially much more difficult to answer. To answer it we need evidence from many languages, and we need to have diagnostics for which types of meaning we put on which side of the divide.  In this discussion, narrow language particular computation does not equate to  universal. I think it is important to acknowledge that. So we need to make a distinction between negotiable meaning vs. non-negotiable meaning and be able to apply it more generally. (The DM version of this question would be: what meanings go into the roots and the encyclopedia as opposed to meaning that comes from the functional heads themselves).

There is an important further question lurking in the background to all of this which is of how the mechanisms of storage and computation are configured in the brain, and what  the role of the actual lexical item is in that complex architecture.  I think we know enough about the underlying patterns of verbal meaning and verbal morphology to start trying to talk to the folks who have done experiments on priming and  the timing of lexical access both in isolation and integrated in sentence processing.   I would have loved to see some interdisciplinary talks at this workshop, but it doesn’t look like it from the programme. 

Still, I am going to be happy if we can start comparing notes and coming up with a consensus on what we can say at this stage about higher level question Z. (If you remember the old Dr Seuss story, Little Cat Z was the one with VOOM, the one who cleaned up the mess).

When it comes to the division of labour between the knowledge store that is represented by knowing the lexical items of one’s language, and the computational system that puts lexical items together, I am not sure we know if we are even asking the question in the right way.  What do we know of the psycholinguistics of lexical access and deployment that would bear on our theories?  I would like to get more up to date on that. Because the minimalist agenda and the constructivist rhetoric essential force us to ask the higher level question Z, and we are going to need some help from the psycholinguists to answer it.  But that perhaps will be a topic for a different workshop.

Somethings you might enjoy

I read a blog managed by Massimo Pigliucci called Scientia Salon. He posts himself and carries interesting stuff on the current evo biology that I find informative.  At any rate he has recently posted two things that you might enjoy.

First, there is this piece on the Formal Darwinism Project. The aim seems to be to provide a rational basis for the kind of teleological/functional/good design thinking that evo theorists find so compelling (and, it appears given the paper, they find it so for some good reasons). Of course, such thinking is hardly foolproof and there are lots of times when it fails. The idea seems to be to ground it and see where it works and where not. Interestingly, part of the effort is to find those circumstances in which not knowing much about the genetics won't make much of a difference.  This is where the "phenotypic gambit" works. Here's the author:

In 1984, I coined the term ‘Phenotypic Gambit’ for the research strategy of studying organisms in ignorance of the actual genetic architecture of the trait in question … The Phenotypic Gambit articulates the assumption that is usually made implicitly in this work, and the formal darwinism project aims to understand better why and how the gambit works when it does, and also to identify and understand those cases in which the gambit fails.
Interestingly, it seems that much (indeed, it seems, most) work in evolutionary is done in complete and utter ignorance of the relevant genetics, on the assumption that in many cases "the genetic details, which aren't known, are unlikely to matter" (quote from paper post links to. It's behind a paywall, but many can get it through their university libraries). Here's another quote from Jarrold Hadfield (170):

If you exclude simple Mendelian traits…then we know very little about the genetic basis of most traits.
Why do I mention this? Well, there is a huge amount of skepticism regarding Darwin's Problem. Some of this stems from the fact that we know little about the genetics underlying language so that thinking about it is just so much hand waving. This was a theme at the Athens conference (in fact, I might have been the one person there who did not buy into this) and it was also a theme discussed on this blog here. However, if this article is right, then it seems that it is a problem way beyond anything having to do with DP as applied to FL. It is very very common in evo investigations. And if it is ok for people studying stuff in animals to make the Phenotypic Gambit (as a useful idealization and always ready to retreat when it proves wrong) then why not in the study of FL/UG as well.

In our case, the gambit amounts to assuming that a "simple" phenotypic description will translate into a simple genetic one. This may be wrong, but it seems to be widely adopted despite the obvious problems. In short, it seems that perhaps (see the hedging here) those interested in DP are doing exactly what the state of the evo art recommends: do the best you can given that we know little about the genetics of anything bigger than bacteria. At the very least, the phenotypic gambit, the assumption that the genetics, once understood, will not greatly distort the conclusions drawn from phenotypic reasoning, is both widespread in biology and useful. Of course, maybe these people aren't doing real biology either. Maybe.

Second, there is this provocative post by Pigliucci in funding for science research. He points out that the question of why society should fund pure science is one that needs to be seriously addressed. Moreover, the standard arguments seem to lack much serious empirical grounding once one gets beyond anecdote.  Linguists should think this question through given that more and more of our work is being supported by gvmt grants or foundations. Why should they fund it?  The argument that one day it will help us cure cancer is not that compelling. What is more compelling is that I actually no of virtually no interesting applied (aka translational) work that does not rely on huge amounts of work funded for less instrumental ends.  In other words, from the little I know, most translational research presupposes results gained from publicly funded efforts. The results are easy enough to spot all around us today. The  last breakthroughs are almost always based on gvmt sponsored work (think internet, iPhone, computer, most of current molecular biology etc.). As I noted sometime ago, the computer would not exist but for the work of logicians interested in the foundation of mathematics. The fact is that most of the wonders around us hail from curiosity driven research. And what is also clear is that the fruits of this work would have been virtually impossible to anticipate ex ante.

Pigliucci touches on one other theme that is noteworthy: the bullshittification of grant applications when the one needs to defend ones work in purely instrumental terms. His observations quoted here fit well with my own:
When I was submitting grant proposals to NSF, I was required to also fill out a section about the “broader impact” of my research (which was on genotype-environment interactions in a species of weedy plants). It was always an afterthought, a boilerplate that got copied from proposal to proposal. And so were those of most of my colleagues. The reason is that — even though I was actually studying something for which practical applications were not at all far fetched (e.g., weed control, invasive biology), that’s not why I was doing it. I was doing it because I had a genuine basic curiosity about the science involved. Indeed, had NSF really only funded basic research that had a direct link to applications I could have done pretty much the same thing on a different model system, say a weed or an invasive species with well demonstrated commercial effects. And mine was by far not even close to being the most narrowly focused and idiosyncratic piece of science carried out within my own department, let alone in the US at large.
At any rate, the piece raises important issues: why should anyone fund our work? Why should they care? Here we need to be able to elaborate what we do for a wider audience in terms that they can understand. I've discussed this before (here). Pigliucci's discussion pushes the question further. It is not unreasonable for people to ask why we should keep paying. One answer is that the problems we try to investigate are intrinsically interesting. I believe that this is right. And I have a spiel. Do you? If not, get one!

Tuesday, June 16, 2015

Science "fraud"

Here's another piece on this topic in the venerable (i.e. almost always behind the curve) NYTs. The point worth noting is the sources of the "crimes and misdemeanors," what drive retraction and lack of replicability in so many domains. Curiously, outright fraud is not the great plague. Rather (surprise surprise), the main problems come from data manipulation (i.e. abuse and misuse of stats), plagiarism and (I cannot fathom why this is a real problem) publishing the same results in more than one venue.  Outright fraud comes in at number four and the paper does not actually quantify how many such papers there are. So, if you want to make the data better, then beware of statistical methods! They are very open to abuse; either as data trimming or phishing for results. This does not mean that stats are useless. Of course they aren't. But they are tools open to easy abuse and misunderstanding. This is worth keeping in mind when the stats inclined rail against our informal methods. It's actually easy to control for bad data in linguistics. To repeat, I am all in favor of using stat tools if useful (e.g. see the last post), but as is evident, sat because it is "statistically" represented is not without its own problems.

Last point: the NYT reports some dissenters's opinions regarding how serious this problem really is. People delight in being super concerned about these sorts of problems. As I have stated before, I am still not convinced that this is a real problem. The main problem is not bad data but very bad theories. When you know nothing then bad data matters. When you do, much much less. Good theory (even at the MLG level) purifies data. The problem is less bad data than the idea that data is the ultimate standard. There is one version of this which is unarguable (that facts matter). But there is another, the strong data first version, that is pernicious (every data point is sacrosanct). The idea behind the NYT article seems to be that if we are only careful and honest all the data we produce will be good. This is bunk. There is no substitute for thinking, no matter how much data one gets. And stats hygiene will not make this fact go away.

Sunday, June 14, 2015

Islands are not parametric

Every now and then the world works exactly the way our best reasoning (and theory) says it is supposed to work. Pauli (whose proposal was en-theoried by Fermi) postulates the neutrino to save the laws of conservation and momentum in the face of troubling data from beta decay (see here) and 20 years later the little rascalino is detected and 40 years later Nobel prizes are collected. Ditto for the Higgs field, but this time the lag was over 40 years from theory to detection, and again a Nobel for the effort. These are considered some of the great moments in science for they are times when theory insisted that the world was a certain way despite apparent counter-evidence, and patience plus experimental ingenuity proved theory right. In other words, we love these cases for they comes as close as we can hope to get to having proof that we understand something about the world.

Now linguistics (I am almost embarrassed to say this, but I really don’t want to be painted as hyperbolic) is not quantum mechanics. It’s not even Newtonian mechanics (if only!). But every now and then we can snag a glimpse of FL by considering how its theories, and the logic that allows us to develop these, yield unexpected validation of its core tenets. The logic I refer to in this case is the PoS. Many denigrate its value. Many are suspicious of its claims and dispirited by its crude reasoning. They are wrong. Today I want to point to one of its big success stories.  And I want to luxuriate in the details so that the wonders of PoS thinking shine clearly through.

The Athens participants, to a person, touted islands effects as one of GG’s great discoveries. It is such for several reasons.

First, they are non-obvious in the simple sense that the fact that islands exist is not cullable from inspection of live text. Listen all you want to the speech around you and you will nary spot an island. It is the classic example of a dog that doesn’t bark. It is only when you ask informants about extractions that the distinctive properties of structures as islands shines through.

Second, it takes quite a bit of technical apparatus to even describe an island. No conception of phrasal structure, no islands. No understanding of movement as a transformation of a certain sort, no islands. So, the very fact that islands are linguistic objects with their own distinctive properties only becomes evident when the methods and tools of GG become available.

Third, and IMO the most important point, islands are perfect probes into the structure of FL and were among the first linguistic objects to tell us anything about its structure. Why are they perfect probes? Because if islands exist (and they do, they do) then their existence must be grounded in the structure of FL. Let me say this very important point another way: the fact that there are islands cannot be something that is learned (in the simple sense learning as induction from the ambient linguistic data). That islands exist and constrain movement operations cannot possibly be learned because there is no data to learn this from in the PLD. Indeed, absent inquiries by pesky linguists interested in islands, there would be virtually no data at all concerning their islandish properties, neither in the PLD nor the positive LD.  Island data must be manufactured by hard working linguists to be seeable at all. Thus, island phenomena are the quintessential examples of linguists acting like real scientists: they are unobvious, based on factitious data, only describable against a pretty sophisticated technical background and pregnant with implication for the fine structure of the principle object of inquiry, FL.  Wow!! So islands are a really, really, really big deal.

And this sets up why this poster by Dave Kush, Terje Lohndal and Jon Sprouse (KLS) is so exciting. As the third point above notes, what makes islands so exciting is that they are the perfect probes into the structure of FL precisely because whatever properties they have cannot possibly be learned and so must reflect the native structure of FL. To repeat, there exists no data relevant to identifying islands and their properties in either the PLD or the positive LD. But this absence of data implies that island effects should not vary across Gs. Why? Because variation is a function of FL’s response to differential data input and if there is no plausible data input relevant to islands then there cannot be variation. However, for many years now, it has been argued that different Gs invidiously distinguish among the islands. In particular, since the early 1980s several have argued that the Scandinavian languages don’t have islands. KLS shows that this is simply incorrect. Here’s how it shows this.

The argument that Scandinavian Gs are not subject to islands rests on the claim that they do not display island effects. What are island effects? These are the native speaker judgments which categorize extractions out of islands as unacceptable. Thus, a native speaker of English will typically judge (1) as garbage, this being typically annotated with a “*”.

(1)  *What did the nominee hear the rumor that Jon got

In contrast to this judgment concerning (1) in English, the claim has been that speakers of Swedish and Norwegian accept such sentences and do not give them *s. Maybe they give them at most a ?, but never a *. So the basis for the claim that Scandinavian Gs are exempt from islands is that native speakers do not find them unacceptable. The conclusion that has been drawn is that Gs vary wrt to islands. But this is impossible given PoS reasoning that implies that Gs could never so vary, ever. As you can see, there is an impasse here: theory says no variation, empirics say there is. Not surprisingly, given the low value much of the field assigns to PoS style theoretical “speculation” (aka logic), much of the field has concluded that there is something deeply wrong about our theory of islands. [1]  

Now for a long time this is how things sat. The obvious answer to this empirical challenge is to deny that Scandinavian acceptability judgments are good indicators of grammaticality wrt Scandinavian islands. In other words, the fact that Scandinavian speakers accept movement out of Scandinavian islands does not imply that the structures so derived are grammatical. More pithily, in this particular case (un)acceptability poorly tracks (un)grammaticality.

Let me quickly say two things about this general point before proceeding lest too big a moral gets drawn. First, it is important to remember that within GG, conceptually speaking, “ungrammatical” is not a synonym for “unacceptable,” despite the practice within linguistics of confusing them (especially terminologically).  “Acceptable” is a descriptive predicate of data points, a report of speaker judgments. “Grammatical” is a theoretical predicate applied to G constructs. GGers have been lucky in that over a large domain of data the two notions coincide. In other words, often, (un)acceptability is a good indicator of (un)grammaticality. However, one obvious way of dissolving the Scandinavian counter-examples above is to suggest that the link between the two is looser in this particular case. In other words, that the English data is more revealing of the underlying G facts than is the Scandinavian data in this case.

Second, the fact that this may be so in this case does NOT mean that acceptability is always or generally or usually a bad indicator of grammaticality. It isn’t. First, as Sprouse and Almeida and Schutze have demonstrated in their various papers, over a large and impressive range of data, the quick and dirty acceptability judgment data is a very stable and reliable kind of data.[2] Second, cross-linguistic research over the last 60 years has shown that the MLGs discovered in one language using acceptability data in that language generally map quite well onto the acceptability data gathered in other languages for the same constructions. In fact, what made the Scandinavian data intriguing is that it was somewhat of an outlier. Many many languages (most?) exhibit English style island effects. So, even if the assumption that acceptability tracks grammaticality might not be perfectly correct, it is roughly so and thus it is prima facie reasonable to take it to be a faithful indicator of grammaticality ceteris paribus.[3]

Ok, back to KLS. How does it redeem the PoS view of islands? Well, it provides a method for detecting island effects independently of binary (i.e. ok vs *) acceptability judgments. The probe comes from a battery of relative acceptability data gathered using the now well-known techniques of Experimental Syntax (ES). ES does acceptability judgment gathering more carefully than we tend to do. Factors are separated out (distance vs island) and their interaction (more) carefully compared. Using this technique one can gather relative acceptability data even among sentences all of which are judged quite acceptable. Using this method, the empirical signature of ungrammaticality is a super additivity (SA) profile apparent when you cross distance and structure.[4]

With this machinery in place, we are ready for KLS’s big find. KLS applies this SA probe to English and Scandinavian and shows that both languages display a SA profile for sentences involving extraction from islands.[5] Where the languages differ then is not in being responsive to islands but in how speakers map an island violation into a binary good/bad judgment. English speakers judge island violations as bad and Scandinavian speakers often judge them as ok.[6]

Conclusion: Scandinavian obeys islands restrictions and this is visible when we use a more sensitive measure of G structure than ok vs *. In other words we clearly see the effects of islands in Scandinavian when we look at their SA profiles. 

We should all rejoice here. This is great. The PoS reasoning we outlined above is vindicated. Gs do not differ wrt their obeisance to island restrictions and these are still excellent probes into the structure of FL. I, of course, am not at all surprised. The logic behind the PoS is impeccable. I have the irrational belief that if something is logically impossible then it is also metaphysically impossible. Thus, if some G difference cannot be  learned due to an absence of any possibly relevant data, all Gs must be the same. This is as close to apodictic reasoning as we are likely to find in the non-mathematical sciences, so I have always assumed that the KLS results (or some other indication that Scandinavian obeys islands) must exist. That said, who can’t delight when logic proves efficacious? I know I can’t!

However, KLS is actually even more interesting than this. It not only vindicates the logic of the GG program against a long-standing apparent problem, but it also reshapes the domain of inquiry. How? Well, note that English and Scandinavian still differ despite the fact that both show island effects. After all, the former assign *s to sentences that the latter assign at most ?s to. Why? What’s going on? KLS offers some speculations worth investigating regarding how non-syntactic conditions might affect overall (i.e. binary) acceptability judgments.[7] I personally suspect that this overall measure is affected by many different factors including intonation (and hence old/new info structure) lexical differentiation and a host of other things that I really can’t imagine. Sorting this out will be hard if for no other reason that we have not really concentrated much on how these myriad effects interact to provide an overall judgment. What KLS shows is that if we are interested in how speakers construct an overall judgment (and, off hand, it is not clear to me that we should be interested in this but I am happy to hear arguments for why we would be), then this is where you need to look for “variation.” Why? Because they have provided very good evidence that islands are NOT parameterized, which is the conclusion that elementary PoS reasoning leads to.

Diogo Almeida, has a nice paper (here, and here)[8] that does similar things along these ES lines for extraction out of Wh-islands in Brazilian Portuguese (BP). The paper uses ES methods to probe not only sensitivity to islands but also to compare how sensitive  different constructions are to island restrictions. The paper compares Topicalization and Left Dislocation showing that both induce island effects (as On WH Movement would lead us to expect). It also shows that BP here differs in part from English, raising further interesting research issues. I for one would love to see ES applied in comparing Topicalization, Left Dislocation and Hanging Topics in languages where one finds case connectivity effects. How do these do wrt islands? I can imagine a simple predication wherein case connectivity being a diagnostic of movement implies that Hanging Topics do not exhibit SA effects in island contexts. Is this so? I have no idea.

Diogo’s paper provides one further service. It provides a nice name for SA-without-unacceptability effects. Diogo dubs them “subliminal islands.” Diogo argues that the simple existence of subliminal effects has interesting implications for how we understand SA effects. He argues, convincingly IMO, that we expect subliminal effects if grammaticality is one component of acceptability, not so much if we take a performance view of islands. The paper also has a nice discussion of the conceptual relation between acceptability and grammaticality that I recommend highly.

Ok, time to end. I have two concluding points.

First, IMO, this kind of work demonstrates that ES can tell us something that we didn’t know before. Heretofore, ES work has aimed to either re-establish prior results (Jon’s work was aimed at showing that island effects are real) or to defend the homeland against the barbarians (Jon & Co arguing that informal methods are more than good enough much of the time). Here, KLS and Diogo’s paper show that we can resolve old problems and learn new things using these methods. In particular, these papers show that ES methods can open up new questions even in well-understood domains. In particular, I believe that both papers show that ES has the potential to reinvigorate research into the grammatical structure of islands. I mention this because many syntacticians have been wary of ES, thinking that it would just make everyone’s cushy life harder. And though I generally agree, that ES methods do not displace the more informal ones we use, these papers demonstrate that they can have real value and we should not be afraid (or reluctant) of using them.

Second, go out and celebrate. Tell friends about this stuff. It’s science at its best. Yipee!

[1] The impression I have (even after Athens) is that such speculation is considered to be just this side of BS.  At the very least it really plays (and can play) no serious role in our practice. I may be a tad over-sensitive here.
[2] The fact that such a crude method of data collection has proven to be so reliable and useful raises an interesting question IMO: why? Why should such a dumb method of gathering data work so well? I believe that this is a function of the modularity of FL and the fact that Gs play a prominent role in all aspects of linguistic performance. More exactly, how Gs are used affects performance but performance does not affect how Gs are structured. Thus Gs make an invariant and important contribution to each performance. This is why their effects can be so easily detected, even using crude methods. At any rate, whether this diagnosis is correct, the fact that such a crude probe has been so successful is worth trying to understand.
[3] Note that this implies that the challenges to the universality of islands based on Scandinavian data were reasonable, even if ultimately wrong.
[4] See Jon’s thesis here, or many of his other writings for a simple explanation of these effects.
[5] KLS discusses Wh islands, Noun complement islands, subject islands and adjunct islands. It does not address relative clauses. However, Dave Kush has told me that they have looked at these too and the results are as expected: Scandinavian shows the same SA profile for RCs as for all the others.
[6] There is a kind of judgment one often hears from linguists which this methodology questions. It’s “the sentence is not perfect but it is grammatical.” The ES methodology suggests that one treat this kind of judgment very gingerly for it might indicate the underlying effects of the relevant G distinction, not its absence, as is typically concluded. Note that the quoted judgment above runs together “acceptable” and “grammatical” in a not wholly coherent way. It relies on the assumption that informal judgments concerning degree of unacceptability is a reliable probe of the binary grammatical/ungrammatical distinction. In other words, it assumes that grammaticality is binary (which it may be, but who knows) and that the severity of an acceptability judgment can reliably indicate the grammatical status of a structure. This assumption is challenged in the KLS paper.
[7] Sprouse’s thesis already demonstrated how e.g. D-linking could affect overall acceptability (i.e. the binary judgment) without inducing an SA signature.
[8] Same paper but for formatting. Latter is published version for citation purposes.