Welcome to Sussex NLTK package documentation!

The aim of the sussex_nltk package is to provide access to additional corpora and functionality not distributed with the normal nltk distribution.

The cmu module wraps the Java TM based Carnegie Mellon University Twitter tokenizer and part-of-speech tagger (ArkNLP).

The corpus_readers module provides access to five additional corpora (Amazon Customer Reviews, Medline abstracts, Twitter posts, Reuters RCV1 and Wall Stree Journal). Detailed information about these corpora can be found in the corpora.

The spell module provides access to the Aspell spell checker dictionary.

The stats module implements various statistical functions related to computing corpus statistics.

The tag module provides high level access to the CMU Twitter and Stanford part-of-speech taggers.

The tokenize module provides high level access to the CMU Twitter tokenizer.

Contents:

Indices and tables

Table Of Contents

Next topic

Sussex NLTK Corpora

This Page