A parser evaluation corpus of English based on a grammatical relation annotation scheme is now available. It consists of 500 sentences (around 10000 words) extracted randomly from the SUSANNE corpus.
There are four files: the (tokenised) raw text, the lemmatised and numbered sentences, the grammatical relation annotation and software that can be used to automatically evaluate parser output. A specification of the annotation scheme is also online. (Please note that this specification refers to the latest version of the annotated corpus, and supersedes the one in the publications listed below). The corpus is free for research purposes; for any proposed commercial use please contact John Carroll.
We would be pleased to receive comments on the scheme, the annotated corpus or the evaluation software.
10 October 2002 - a new version of the grammatical relation annotation incorporating around 20 additions/fixes in response to comments from Ron Kaplan.
7 November 2002 - fixed annotation of "got" used as an auxiliary.
Descriptions of the grammatical relation annotation scheme are published in
Carroll, J., G. Minnen and E. Briscoe (2003) Parser evaluation: using a grammatical relation annotation scheme. In A. Abeillé (ed.), Treebanks: Building and Using Syntactically Annotated Corpora, Dordrecht: Kluwer. 299-316.
Carroll, J., G. Minnen and E. Briscoe (1999) Corpus annotation for parser evaluation. In Proceedings of the EACL-99 Post-Conference Workshop on Linguistically Interpreted Corpora, Bergen, Norway. 35-41. Also in Proceedings of the ATALA Workshop on Corpus Annotés pour la Syntaxe - Treebanks, Paris, France. 13-20.
Carroll, J., E. Briscoe and A. Sanfilippo (1998) Parser evaluation: a survey and a new proposal. In Proceedings of the 1st International Conference on Language Resources and Evaluation, Granada, Spain. 447-454.
Proposals for improvements to the scheme are discussed in
Briscoe, E., J. Carroll, J. Graham and A. Copestake (2002) Relational evaluation schemes. In Proceedings of the Beyond PARSEVAL Workshop at the 3rd International Conference on Language Resources and Evaluation, Las Palmas, Gran Canaria. 4-8.
Back to John A. Carroll's homepage