English Parser Evaluation Corpus

A parser evaluation corpus of English based on a grammatical relation annotation scheme is now available. It consists of 500 sentences (around 10000 words) extracted randomly from the SUSANNE corpus.

There are four files: the (tokenised) raw text, the lemmatised and numbered sentences, the grammatical relation annotation and software that can be used to automatically evaluate parser output. A specification of the annotation scheme is also online. (Please note that this specification refers to the latest version of the annotated corpus, and supersedes the one in the publications listed below). The corpus is free for research purposes; for any proposed commercial use please contact John Carroll.

We would be pleased to receive comments on the scheme, the annotated corpus or the evaluation software.

Recent changes:

Descriptions of the grammatical relation annotation scheme are published in

Proposals for improvements to the scheme are discussed in

