.. Sussex NLTK documentation master file, created by sphinx-quickstart on Mon Oct 1 11:54:24 2012. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. Welcome to Sussex NLTK package documentation! ============================================= The aim of the ``sussex_nltk`` package is to provide access to additional corpora and functionality not distributed with the normal ``nltk`` distribution. The ``cmu`` module wraps the Java :sup:`TM` based Carnegie Mellon University Twitter tokenizer and part-of-speech tagger (`ArkNLP `_). The ``corpus_readers`` module provides access to five additional corpora (Amazon Customer Reviews, Medline abstracts, Twitter posts, Reuters RCV1 and Wall Stree Journal). Detailed information about these corpora can be found in the :keyword:`corpora`. The ``spell`` module provides access to the Aspell spell checker dictionary. The ``stats`` module implements various statistical functions related to computing corpus statistics. The ``tag`` module provides high level access to the CMU Twitter and Stanford part-of-speech taggers. The ``tokenize`` module provides high level access to the CMU Twitter tokenizer. Contents: .. toctree:: :maxdepth: 2 corpora.rst sussex_nltk.rst Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search`