Welcome to Sussex NLTK package documentation!¶

The aim of the sussex_nltk package is to provide access to additional corpora and functionality not distributed with the normal nltk distribution.

The cmu module wraps the Java ^TM based Carnegie Mellon University Twitter tokenizer and part-of-speech tagger (ArkNLP).

The corpus_readers module provides access to five additional corpora (Amazon Customer Reviews, Medline abstracts, Twitter posts, Reuters RCV1 and Wall Stree Journal). Detailed information about these corpora can be found in the corpora.

The spell module provides access to the Aspell spell checker dictionary.

The stats module implements various statistical functions related to computing corpus statistics.

The tag module provides high level access to the CMU Twitter and Stanford part-of-speech taggers.

The tokenize module provides high level access to the CMU Twitter tokenizer.

Contents:

Welcome to Sussex NLTK package documentation!¶

Indices and tables¶

Table Of Contents

Next topic

This Page

Navigation

Welcome to Sussex NLTK package documentation!¶

Indices and tables¶

Table Of Contents

Next topic

This Page

Quick search

Navigation