.. Sussex NLTK documentation master file, created by
   sphinx-quickstart on Mon Oct  1 11:54:24 2012.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

Welcome to Sussex NLTK package documentation!
=============================================

The aim of the ``sussex_nltk`` package is to provide access to additional corpora
and functionality not distributed with the normal ``nltk`` distribution.

The ``cmu`` module wraps the Java :sup:`TM` based Carnegie Mellon University Twitter tokenizer and 
part-of-speech tagger (`ArkNLP <http://www.ark.cs.cmu.edu/TweetNLP/>`_).

The ``corpus_readers`` module provides access to five additional corpora
(Amazon Customer Reviews, Medline abstracts, Twitter posts, Reuters RCV1 and Wall Stree Journal).
Detailed information about these corpora can be found in the :keyword:`corpora`.

The ``spell`` module provides access to the Aspell spell checker dictionary.

The ``stats`` module implements various statistical functions related to
computing corpus statistics.

The ``tag`` module provides high level access to the CMU Twitter and Stanford part-of-speech taggers.

The ``tokenize`` module provides high level access to the CMU Twitter tokenizer.

Contents:

.. toctree::
   :maxdepth: 2

   corpora.rst
   sussex_nltk.rst


Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`