Grammatical Relation Annotation

Ted Briscoe, John Carroll March 2000

Carroll, Minnen & Briscoe (1998) argue that the currently-dominant constituency-based paradigm for parser evaluation has serious shortcomings1. Instead they propose an annotation scheme in which each sentence in the corpus is marked up with a set of grammatical relations (GRs), specifying the syntactic dependency which holds between each head and its dependent(s). We have extended this scheme with new relations as outlined in this document, although the basic architecture remains the same.

The relations are organised hierarchically: see Figure 1. The most generic relation between a head and a dependent is dependent. Where the relationship between the two is known more precisely, relations further down the hierarchy can be used, for example mod(ifier) or arg(ument). Relations mod, arg_mod, aux, clausal, and their descendants have slots filled by a type, a head, and its dependent; arg_mod has an additional fourth slot initial_gr. Descendants of subj, and also dobj have the three slots head, dependent, and initial_gr. Relation conj has a type slot and one or more head slots. The x and c prefixes to relation names differentiate clausal control alternatives.

  
Figure: The grammatical relation hierarchy.
\begin{figure}
\centering
{\tt\setlength{\unitlength}{0.65pt}
{\small\begin{pic...
...icture needs detmod, aux and conj adding -- ejb/jac
\end{picture}}}
\end{figure}

One potential advantage of this scheme is that it is possible to deal more `intelligently' with annotator error or differences and/or parser differences. For example, if an adverbial PP has been incorrectly annotated as an argument, or a particular parser treats all non-obligatory PPs as adverbial modifiers, at the most generic `dependent' level the correct relation will still be specified; i.e. for Marisa flew to Rome, we have dependent(to, fly, Rome) regardless of whether the relation is annotated as iobj or ncmod (see below).

With morphosyntactic processes modifying head-dependent links (e.g. passive, dative shift), two kinds of GRs can be expressed: (1) the initial GR, i.e. before the GR-changing process occurs; and (2) the final GR, i.e. after the GR-changing process occurs. For example, Paul in Paul was employed by Microsoft is both the initial object and the final subject of employ.

Each relation in the scheme is described individually below.

dependent(introducer, head, dependent)

This is the most generic relation between a head and a dependent (i.e. it does not specify whether the dependent is an argument or a modifier). For example:
dependent(in, live, Rome) Marisa lives in Rome
dependent(that, say, leave) I said that he left

mod(type, head, dependent)

The relation between a head and its modifier; where appropriate, type indicates the lexical item introducing the dependent; e.g.
mod(_, flag, red) a red flag
mod(_, walk, slowly) walk slowly
mod(with, walk, John) walk with John
mod(while, walk, talk) walk while talking
mod(_, Picasso, painter) Picasso the painter

The mod GR is also used to encode the relationship between an event noun (including deverbal nouns) and its participants; e.g.
mod(of, gift, book) the gift of a book
mod(by, gift, Peter) the gift ... by Peter
mod(of, examination, patient) the examination of the patient
mod(poss, examination, doctor) the doctor's examination

In a similar vein, cmod is used with (deverbal) noun sentential complements (always optional), though adjectival sentential complements are annotated ccomp like verbal ones.

cmod, xmod, ncmod

Clausal and non-clausal modifiers are distinguished by the use of the GRs cmod/xmod, and ncmod respectively, each with slots the same as mod. The GR ncmod is for non clausal modifiers; cmod is for adjuncts controlled from within, and xmod for adjuncts controlled from without, e.g.
xmod(without, eat, ask) he ate the cake without asking
cmod(because, eat, be) he ate the cake because he was hungry
ncmod(_, flag, red) a red flag

(Note that nominal complements are treated as modifiers and are also specialized into the three types.)

In practice, ncmod is the most common relation type and is used to encode PP, adjectival/adverbial modification, nominal modifiers, and so on. ncmod is also used to indicate a particle `modifier' in a verb particle combination, e.g. Bill looked the word up is encoded as ncmod(prt, look, up); of in consist of is treated as a particle. ncmod is also used to mark possessives, as in John's idea with encoding ncmod(poss, idea, John), and for preposed adjuncts, as in where he went encoded as ncmod(pre, go, where). The auxiliary sequence have to meaning `must' is annotated ncmod(prt, have, to); have is otherwise treated as a standard auxiliary.

detmod

The relation between a noun and determiner. type is usually empty, but is `poss' for pronominal determiners.
detmod(_, system, a) a system
detmod(poss, advocacy, his) his advocacy

There is a case for the inclusion of a `predetmod' relation, for example for all in all the men; at the moment these are annotated as detmod or ncmod on the head noun.

arg_mod(type, head, dependent, initial_gr)

The relation between a head and a semantic argument which is syntactically realised as a modifier; thus in English a `by-phrase' in a passive construction can be analysed as a `thematically bound adjunct'. The type slot indicates the word introducing the dependent: e.g.
arg_mod(by, kill, Brutus, subj) killed by Brutus

This relation is also used for VP/clausal subjects in passive constructions. In all cases, the initial_gr is just specified as `subj' and not further distinguished.

arg(head, dependent)

The most generic relation between a head and an argument.

subj_or_dobj(head, dependent)

A specialization of the relation arg which can instantiate either subjects or direct objects. It is useful for those cases where no reliable bias is available for disambiguation. For example, both Gianni and Mario can be subject or object in the Italian sentence
Mario, non l'ha ancora visto, Gianni  
`Mario has not seen Gianni yet'/`Gianni has not seen Mario yet'  

In this case, a parser could avoid trying to resolve the ambiguity by using subj_or_dobj, e.g.
subj_or_dobj(vedere, Mario)  
subj_or_dobj(vedere, Gianni)  

An alternative approach to this problem would have been to allow disjunctions of relations. We did not pursue this since the number of cases where this might be appropriate appears to be very limited.

subj(head, dependent, initial_gr)

The relation between between a predicate and its subject; where appropriate, the initial_gr indicates the syntactic link between the predicate and subject before any GR-changing process:
subj(arrive, John, _) John arrived in Paris
subj(employ, Microsoft, _) Microsoft employed 10 C programmers
subj(employ, Paul, obj) Paul was employed by IBM

With pro-drop languages such as Italian, when the subject is not overtly realised the annotation is, for example, as follows:
subj(arrivare, Pro, _) arrivai in ritardo `(I) arrived late'

in which the dependent is specified by the abstract filler `Pro', indicating that person and number of the subject can be recovered from the inflection of the head verb form.

The initial GR slot is also used to indicate subject-auxiliary inversion when the verb is an auxiliary; e.g. where is the writer? is annotated ncsubj(be, writer, inv). The slot is also used to indicate a mod(ifier) link in locative inversion: here sits the king with encoding ncsubj(sit, here, mod).

csubj, xsubj, ncsubj

The GRs csubj and xsubj indicate clausal subjects, controlled from within, or without, respectively. ncsubj is a non-clausal subject. E.g.
csubj(leave, mean, _) that Nellie left without saying good-bye meant she was angry
xsubj(win, require, _) to win the America's Cup requires heaps of cash

comp(head, dependent)

The most generic relation between a head and a complement.

obj(head, dependent, initial_gr)

The most generic relation between a head and an object. The initial_gr slot is used to indicate the underlying `subj' relation in locative inversion; e.g. here sits the king is annotated dobj(sit, king, subj).

dobj(head, dependent, initial_gr)

The relation between a predicate and its direct object--the first non-clausal complement following the predicate which is not introduced by a preposition (for English and German); initial_gr is `iobj' after dative shift; e.g.
dobj(read, book, _) read books
dobj(mail, Mary, iobj) mail Mary the contract

iobj(type, head, dependent)

The relation between a predicate and a non-clausal complement introduced by a preposition; type indicates the preposition introducing the dependent; e.g.
iobj(in, arrive, Spain) arrive in Spain
iobj(into, put, box) put the tools into the box
iobj(to, give, poor) give to the poor

obj2(head, dependent)

The relation between a predicate and the second non-clausal complement in ditransitive constructions; e.g.
obj2(give, present) give Mary a present
obj2(mail, contract) mail Paul the contract

clausal(head, dependent)

The most generic relation between a head and a clausal complement.

xcomp(type, head, dependent)

The relation between a predicate and a clausal complement which has no overt subject (for example a VP or predicative XP). The type slot indicates the complementiser/preposition, if any, introducing the XP. E.g.
xcomp(to, intend, leave) Paul intends to leave IBM
xcomp(_, be, easy) Swimming is easy
xcomp(in, be, Paris) Mary is in Paris
xcomp(_, be, manager) Paul is the manager

Control of VPs and predicative XPs is expressed in terms of GRs. For example, the unexpressed subject of the clausal complement of a subject-control predicate is specified by saying that the subject of the main and subordinate verbs is the same:
   
subj(intend, Paul, _)
xcomp(to, intend, leave)
subj(leave, Paul, _)
dobj(leave, IBM, _)
Paul intends to leave IBM

ccomp(type, head, dependent)

The relation between a predicate and a clausal complement which does have an overt subject; type is the same as for xcomp above. E.g.
ccomp(that, say, accept) Paul said that he will accept Microsoft's offer
ccomp(that, say, leave) I said that he left

This GR is also used for wh-comps like whether, and--more questionably--for wh-adjuncts like where, as in I wonder where he went, encoded as ccomp(where, wonder, go).

aux(type, head, dependent)

The relation between an auxiliary verb and a main or other auxiliary verb. type is empty, or `inv' in the case of subject-auxiliary inversion. The main verb or righthand auxiliary is treated as `head' because we want to encode most links with verb groups in terms of the semantically informative main verb element:
aux(_, tend, have) have tended towards...
aux(inv, be, can) can he be clever

conj(type, head+)

Although coordination relations are distributed in arg/mod GRs, conj is also used to annotate the type of conjunction and the heads of the conjuncts, as in:
conj(and, priest, soldier, member) the priests, the soldiers and the other members of the
  group
conj(or, smile, laugh) John smiled or Susan laughed



A few more specific remarks on the English / SUSANNE-based test corpus follow:

In the test corpus, dependent links which go beyond the strictly morphosyntactic `evidence' are marked up--for example, control relations, subjects in VP coordination, and other `elliptical' links--where the correct analysis is manifest from the sentence alone. `Arbitrary' control is not annotated since in most cases context would or could specify a non-arbitrary controller.

Pronoun coreference links are not annotated. Relative pronouns (wh) are treated as arguments but not resolved to their head antecedents, though the modification link between the head and pronoun is given as cmod.

Reduced relative clauses are annotated with a ncsubj link between the modified head and the relative clause gap (initial_gr `obj') and an xmod link between the head and the rel-vb. Relative clauses introduced by `that' complementisers are annotated cmod(that, head-noun, rel-vb).

Dummy it or there are not treated specially, so in There is a problem, There is annotated as `subj' of be.

The single most challenging construction to annotate sensibly using this scheme is the comparative, e.g. more friends than outright enemies: we treat more as a ncmod on its head and than as a ncmod on the same head, but ellipsis, lack of heads, and so on make many comparatives quite challenging, and there are quite a few in the test corpus.

In general, where an analysis in the file of test sentences seems questionable or inadequate it has been commented thus: ;;; comment ??. There are also similar comments describing micro-level analysis choices on first occurrence.

We would welcome feedback in the form of further such comments, or amended or improved analyses, including extensions to the basic scheme. We are happy to try to maintain a single consistent but improving test file on the web site.



Figure 2 gives a more extended example of the use of the GR scheme, using the variant (lisp-like) syntax used in the actual file.

  
Figure: Example sentence and GRs ( SUSANNE rel3, lines G22:1460k-G22:1480m).
\begin{figure}
\begin{quote}
When the proprietor dies, the establishment should ...
...nj or acquire decide)\end{verbatim}\end{tex2html_preform}\end{quote}\end{figure}



Footnotes

...comings1
Note that the issue we are concerned with here is parser evaluation, and we are not making any more general claims about the utility of constituency-based treebanks for other important tasks they are used for, such as statistical parser training or in quantitative linguistics.


John Carroll
2000-08-02