Ted Briscoe, John Carroll March 2000
Carroll, Minnen & Briscoe (1998) argue that the currently-dominant constituency-based paradigm for parser evaluation has serious shortcomings1. Instead they propose an annotation scheme in which each sentence in the corpus is marked up with a set of grammatical relations (GRs), specifying the syntactic dependency which holds between each head and its dependent(s). We have extended this scheme with new relations as outlined in this document, although the basic architecture remains the same.
The relations are organised hierarchically: see Figure 1.
The most generic relation between a head and a dependent is dependent.
Where the relationship between the two is known more precisely, relations
further down the hierarchy can be used, for example mod(ifier) or arg(ument). Relations mod, arg_mod, aux, clausal,
and their descendants have slots filled by a type, a head, and
its dependent; arg_mod has an additional fourth slot initial_gr. Descendants of subj, and also dobj have the three
slots head, dependent, and initial_gr. Relation conj has a type slot and one or more head slots. The x and
c prefixes to relation names differentiate clausal control alternatives.
One potential advantage of this scheme is that it is possible to deal more `intelligently' with annotator error or differences and/or parser differences. For example, if an adverbial PP has been incorrectly annotated as an argument, or a particular parser treats all non-obligatory PPs as adverbial modifiers, at the most generic `dependent' level the correct relation will still be specified; i.e. for Marisa flew to Rome, we have dependent(to, fly, Rome) regardless of whether the relation is annotated as iobj or ncmod (see below).
With morphosyntactic processes modifying head-dependent links (e.g. passive, dative shift), two kinds of GRs can be expressed: (1) the initial GR, i.e. before the GR-changing process occurs; and (2) the final GR, i.e. after the GR-changing process occurs. For example, Paul in Paul was employed by Microsoft is both the initial object and the final subject of employ.
Each relation in the scheme is described individually below.
This is the most generic relation between a head and a dependent (i.e. it
does not specify whether the dependent is an argument or a modifier). For
|dependent(in, live, Rome)||Marisa lives in Rome|
|dependent(that, say, leave)||I said that he left|
The relation between a head and its modifier; where
appropriate, type indicates the lexical item introducing the
|mod(_, flag, red)||a red flag|
|mod(_, walk, slowly)||walk slowly|
|mod(with, walk, John)||walk with John|
|mod(while, walk, talk)||walk while talking|
|mod(_, Picasso, painter)||Picasso the painter|
|mod(of, gift, book)||the gift of a book|
|mod(by, gift, Peter)||the gift ... by Peter|
|mod(of, examination, patient)||the examination of the patient|
|mod(poss, examination, doctor)||the doctor's examination|
Clausal and non-clausal modifiers are distinguished by the
use of the GRs cmod/xmod, and ncmod respectively, each
with slots the same as mod. The GR ncmod is for non clausal
modifiers; cmod is for adjuncts controlled from within, and xmod
for adjuncts controlled from without, e.g.
|xmod(without, eat, ask)||he ate the cake without asking|
|cmod(because, eat, be)||he ate the cake because he was hungry|
|ncmod(_, flag, red)||a red flag|
In practice, ncmod is the most common relation type and is used to encode PP, adjectival/adverbial modification, nominal modifiers, and so on. ncmod is also used to indicate a particle `modifier' in a verb particle combination, e.g. Bill looked the word up is encoded as ncmod(prt, look, up); of in consist of is treated as a particle. ncmod is also used to mark possessives, as in John's idea with encoding ncmod(poss, idea, John), and for preposed adjuncts, as in where he went encoded as ncmod(pre, go, where). The auxiliary sequence have to meaning `must' is annotated ncmod(prt, have, to); have is otherwise treated as a standard auxiliary.
The relation between a noun and determiner. type is usually empty,
but is `poss' for pronominal determiners.
|detmod(_, system, a)||a system|
|detmod(poss, advocacy, his)||his advocacy|
The relation between a head and a semantic argument which is syntactically
realised as a modifier; thus in English a `by-phrase' in a passive construction
can be analysed as a `thematically bound adjunct'. The type slot
indicates the word introducing the dependent: e.g.
|arg_mod(by, kill, Brutus, subj)||killed by Brutus|
The most generic relation between a head and an argument.
A specialization of the relation arg which can instantiate
either subjects or direct objects. It is useful for those cases where
no reliable bias is available for disambiguation. For example, both
Gianni and Mario can be subject or object in the Italian
|Mario, non l'ha ancora visto, Gianni|
|`Mario has not seen Gianni yet'/`Gianni has not seen Mario yet'|
The relation between between a predicate and its subject;
where appropriate, the initial_gr indicates the syntactic link
between the predicate and subject before any GR-changing process:
|subj(arrive, John, _)||John arrived in Paris|
|subj(employ, Microsoft, _)||Microsoft employed 10 C programmers|
|subj(employ, Paul, obj)||Paul was employed by IBM|
|subj(arrivare, Pro, _)||arrivai in ritardo `(I) arrived late'|
The initial GR slot is also used to indicate subject-auxiliary inversion when the verb is an auxiliary; e.g. where is the writer? is annotated ncsubj(be, writer, inv). The slot is also used to indicate a mod(ifier) link in locative inversion: here sits the king with encoding ncsubj(sit, here, mod).
The GRs csubj and xsubj indicate clausal subjects,
controlled from within, or without, respectively. ncsubj is a
non-clausal subject. E.g.
|csubj(leave, mean, _)||that Nellie left without saying good-bye meant she was angry|
|xsubj(win, require, _)||to win the America's Cup requires heaps of cash|
The relation between a predicate and its direct object--the
first non-clausal complement following the predicate which is
not introduced by a preposition (for English and
German); initial_gr is `iobj' after dative shift; e.g.
|dobj(read, book, _)||read books|
|dobj(mail, Mary, iobj)||mail Mary the contract|
The relation between a predicate and a non-clausal complement
introduced by a preposition; type indicates the preposition
introducing the dependent; e.g.
|iobj(in, arrive, Spain)||arrive in Spain|
|iobj(into, put, box)||put the tools into the box|
|iobj(to, give, poor)||give to the poor|
The relation between a predicate and the second non-clausal
complement in ditransitive constructions; e.g.
|obj2(give, present)||give Mary a present|
|obj2(mail, contract)||mail Paul the contract|
The relation between a predicate and a clausal complement
which has no overt subject (for example a VP or predicative XP). The type slot indicates the
complementiser/preposition, if any, introducing the XP. E.g.
|xcomp(to, intend, leave)||Paul intends to leave IBM|
|xcomp(_, be, easy)||Swimming is easy|
|xcomp(in, be, Paris)||Mary is in Paris|
|xcomp(_, be, manager)||Paul is the manager|
| subj(intend, Paul, _)
xcomp(to, intend, leave)
subj(leave, Paul, _)
dobj(leave, IBM, _)
|Paul intends to leave IBM|
The relation between a predicate and a clausal complement
which does have an overt subject; type
is the same as for xcomp above. E.g.
|ccomp(that, say, accept)||Paul said that he will accept Microsoft's offer|
|ccomp(that, say, leave)||I said that he left|
The relation between an auxiliary verb and a main or other auxiliary
verb. type is empty, or `inv' in the case of subject-auxiliary
inversion. The main verb or righthand auxiliary is treated as `head' because
we want to encode most links with verb groups in terms of the semantically
informative main verb element:
|aux(_, tend, have)||have tended towards...|
|aux(inv, be, can)||can he be clever|
Although coordination relations are distributed in arg/mod
GRs, conj is also used to annotate the type of
conjunction and the heads of the conjuncts, as in:
|conj(and, priest, soldier, member)||the priests, the soldiers and the other members of the|
|conj(or, smile, laugh)||John smiled or Susan laughed|
A few more specific remarks on the English / SUSANNE-based test corpus follow:
In the test corpus, dependent links which go beyond the strictly morphosyntactic `evidence' are marked up--for example, control relations, subjects in VP coordination, and other `elliptical' links--where the correct analysis is manifest from the sentence alone. `Arbitrary' control is not annotated since in most cases context would or could specify a non-arbitrary controller.
Pronoun coreference links are not annotated. Relative pronouns (wh) are treated as arguments but not resolved to their head antecedents, though the modification link between the head and pronoun is given as cmod.
Reduced relative clauses are annotated with a ncsubj link between the modified head and the relative clause gap (initial_gr `obj') and an xmod link between the head and the rel-vb. Relative clauses introduced by `that' complementisers are annotated cmod(that, head-noun, rel-vb).
Dummy it or there are not treated specially, so in There is a problem, There is annotated as `subj' of be.
The single most challenging construction to annotate sensibly using this scheme is the comparative, e.g. more friends than outright enemies: we treat more as a ncmod on its head and than as a ncmod on the same head, but ellipsis, lack of heads, and so on make many comparatives quite challenging, and there are quite a few in the test corpus.
In general, where an analysis in the file of test sentences seems questionable or inadequate it has been commented thus: ;;; comment ??. There are also similar comments describing micro-level analysis choices on first occurrence.
We would welcome feedback in the form of further such comments, or amended or improved analyses, including extensions to the basic scheme. We are happy to try to maintain a single consistent but improving test file on the web site.
Figure 2 gives a more extended example of the use of the GR scheme, using the variant (lisp-like) syntax used in the actual file.